본문 바로가기
Data Analysis/Query

리트코드 : 1341. Movie Rating

by 베짱이28호 2024. 8. 11.

리트코드 : 1341. Movie Rating


문제

Table: Movies

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| title         | varchar |
+---------------+---------+
movie_id is the primary key (column with unique values) for this table.
title is the name of the movie.


Table: Users

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| user_id       | int     |
| name          | varchar |
+---------------+---------+
user_id is the primary key (column with unique values) for this table.


Table: MovieRating

+---------------+---------+
| Column Name   | Type    |
+---------------+---------+
| movie_id      | int     |
| user_id       | int     |
| rating        | int     |
| created_at    | date    |
+---------------+---------+
(movie_id, user_id) is the primary key (column with unique values) for this table.
This table contains the rating of a movie by a user in their review.
created_at is the user's review date. 


Write a solution to:

Find the name of the user who has rated the greatest number of movies. In case of a tie, return the lexicographically smaller user name.
Find the movie name with the highest average rating in February 2020. In case of a tie, return the lexicographically smaller movie name.
The result format is in the following example.
  • 평점 가장 많이 매긴 사람 + 평점 가장 높은 영화 두 가지를 구하면 된다.

 

문제 풀이

MySQL

with feb as (
    select *, avg(rating) as score
    from movierating
    where left(created_at,7) = '2020-02'
    group by movie_id
),
rated as (
    select *, count(*) as cnt
    from movierating
    group by user_id
),

name as (
    select name as results
    from users u
    join rated r on r.user_id = u.user_id
    where cnt = (select max(cnt) from rated)
    order by name
    limit 1
),

movie_name as (
select m.title as results
from feb f
join movies m on f.movie_id = m.movie_id
where score = (select max(score) from feb)
order by title
limit 1
)

select * from name
union all
select * from movie_name
  • CTE에 결과를 마구마구 저장하기.
  • 평점같은 경우는 movie id로, 평점 매긴 사람은 user_id로 group by
  • 각 테이블에서 서브쿼리로 최대 평가 수, 최대 평점을 가진 것을 가져온다.
  • 컬럼명을 results로 맞춰주고 union all
    • test case 18에 이름/영화이름이 동일한 경우가 있어서 중복허용 안하면 사라진다.

Pandas

import pandas as pd

def movie_rating(movies: pd.DataFrame, users: pd.DataFrame, movie_rating: pd.DataFrame) -> pd.DataFrame:

    user_grouped = movie_rating.groupby('user_id').agg(
        cnt = ('user_id','count')
    ).reset_index()

    avg_score = movie_rating[movie_rating['created_at'].between('2020-02-01','2020-02-29')].groupby('movie_id').agg(
        score = ('rating','mean')
    ).reset_index()

    max_name = pd.merge(user_grouped, users)
    max_movie = pd.merge(avg_score, movies)
    max_name.sort_values(['cnt','name'], ascending=[0,1], inplace=True)
    max_movie.sort_values(['score','title'], ascending=[0,1], inplace=True)

    answer = pd.DataFrame({'results': [max_name['name'].values[0], max_movie['title'].values[0]]})
    return answer
  • 마찬가지로 groupby + agg로 집계하기.
  • merge를 통해 join해주고
  • sort values에서 다중 조건으로 정렬하기

 

코멘트

  • 코멘트

댓글