리트코드 : 1341. Movie Rating
문제
Table: Movies
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| title | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.
Table: Users
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| user_id | int |
| name | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.
Table: MovieRating
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| movie_id | int |
| user_id | int |
| rating | int |
| created_at | date |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date.
Write a solution to:
Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
The result format is in the following example.
- 평점 가장 많이 매긴 사람 + 평점 가장 높은 영화 두 가지를 구하면 된다.
문제 풀이
MySQL
with feb as (
select *, avg(rating) as score
from movierating
where left(created_at,7) = '2020-02'
group by movie_id
),
rated as (
select *, count(*) as cnt
from movierating
group by user_id
),
name as (
select name as results
from users u
join rated r on r.user_id = u.user_id
where cnt = (select max(cnt) from rated)
order by name
limit 1
),
movie_name as (
select m.title as results
from feb f
join movies m on f.movie_id = m.movie_id
where score = (select max(score) from feb)
order by title
limit 1
)
select * from name
union all
select * from movie_name
- CTE에 결과를 마구마구 저장하기.
- 평점같은 경우는 movie id로, 평점 매긴 사람은 user_id로 group by
- 각 테이블에서 서브쿼리로 최대 평가 수, 최대 평점을 가진 것을 가져온다.
- 컬럼명을 results로 맞춰주고 union all
- test case 18에 이름/영화이름이 동일한 경우가 있어서 중복허용 안하면 사라진다.
Pandas
import pandas as pd
def movie_rating(movies: pd.DataFrame, users: pd.DataFrame, movie_rating: pd.DataFrame) -> pd.DataFrame:
user_grouped = movie_rating.groupby('user_id').agg(
cnt = ('user_id','count')
).reset_index()
avg_score = movie_rating[movie_rating['created_at'].between('2020-02-01','2020-02-29')].groupby('movie_id').agg(
score = ('rating','mean')
).reset_index()
max_name = pd.merge(user_grouped, users)
max_movie = pd.merge(avg_score, movies)
max_name.sort_values(['cnt','name'], ascending=[0,1], inplace=True)
max_movie.sort_values(['score','title'], ascending=[0,1], inplace=True)
answer = pd.DataFrame({'results': [max_name['name'].values[0], max_movie['title'].values[0]]})
return answer
- 마찬가지로 groupby + agg로 집계하기.
- merge를 통해 join해주고
- sort values에서 다중 조건으로 정렬하기
코멘트
- 코멘트
'Data Analysis > Query' 카테고리의 다른 글
리트코드 : 1407. Top Travellers (0) | 2024.08.14 |
---|---|
리트코드 : 1393. Capital Gain/Loss (0) | 2024.08.14 |
리트코드 : 1378. Replace Employee ID With The Unique Identifier (0) | 2024.08.13 |
리트코드 : 1327. List the Products Ordered in a Period (0) | 2024.08.10 |
리트코드 : 1321. Restaurant Growth (0) | 2024.08.10 |
리트코드 : 1280. Students and Examinations (0) | 2024.08.08 |
리트코드 : 1251. Average Selling Price (0) | 2024.08.06 |
댓글