Data Analysis/Query
리트코드 : 550. Game Play Analysis IV
베짱이28호
2024. 4. 8. 20:48
리트코드 :550. Game Play Analysis IV
문제
Table: Activity
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| player_id | int |
| device_id | int |
| event_date | date |
| games_played | int |
+--------------+---------+
(player_id, event_date) is the primary key (combination of columns with unique values) of this table.
This table shows the activity of players of some games.
Each row is a record of a player who logged in and played a number of games (possibly 0) before logging out on someday using some device.
Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.
The result format is in the following example.
Example 1:
Input:
Activity table:
+-----------+-----------+------------+--------------+
| player_id | device_id | event_date | games_played |
+-----------+-----------+------------+--------------+
| 1 | 2 | 2016-03-01 | 5 |
| 1 | 2 | 2016-03-02 | 6 |
| 2 | 3 | 2017-06-25 | 1 |
| 3 | 1 | 2016-03-02 | 0 |
| 3 | 4 | 2018-07-03 | 5 |
+-----------+-----------+------------+--------------+
Output:
+-----------+
| fraction |
+-----------+
| 0.33 |
+-----------+
Explanation:
Only the player with id 1 logged back in after the first day he had logged in so the answer is 1/3 = 0.33
문제 풀이
MySQL
WITH F_LOGIN AS (
SELECT PLAYER_ID, MIN(EVENT_DATE) AS FIRST_DATE
FROM ACTIVITY
GROUP BY PLAYER_ID
),
S_LOGIN AS (
SELECT F.PLAYER_ID
FROM F_LOGIN AS F
JOIN ACTIVITY AS A ON F.PLAYER_ID = A.PLAYER_ID
WHERE A.EVENT_DATE = DATE_ADD(F.FIRST_DATE, INTERVAL 1 DAY)
)
SELECT ROUND(COUNT(PLAYER_ID) / (SELECT COUNT(DISTINCT PLAYER_ID) FROM ACTIVITY), 2) AS FRACTION
FROM S_LOGIN;
- LAG로 가져와서 만족하는거 세려고 했는데 잘 안돼서 CTE + JOIN으로 풀이.
- 첫 로그, 두 번째 로그 가져온다음에 중복 제거해서 세주면 된다.
- pandas의 timedelta처럼 interval 메서드가 있어서 날짜 계산해줄 때 풀이 가능하다.
Pandas
import pandas as pd
def gameplay_analysis(activity: pd.DataFrame) -> pd.DataFrame:
activity.sort_values(by=['player_id', 'event_date'], inplace=True)
activity['first_date'] = activity.groupby('player_id')['event_date'].transform('min')
second = activity[activity['event_date'] == activity['first_date'] + timedelta(days=1)]
return pd.DataFrame({'fraction':[round(len(second['player_id'].unique())/len(activity['player_id'].unique()),2)]})
- 각 플레이어 별 첫 접속일을 가져온다.
- event_date와 첫 접속일 + 1일을 만족하는 데이터 프레임을 가져온다.
- 전체 플레이어 중 second에 있는 플레이어 수 가져오기
- min() 대신 transform('min')을 사용해야 사이즈가 맞는 DataFrame이 기존 DataFrame에 병합된다.
코멘트
- MySQL에서 윈도우 함수나 SELECT 절에 CASE WHEN, IF로 채우는 부분들 신경써서 연습하기...