리트코드 : 1280. Students and Examinations
문제
Table: Students
+---------------+---------+
| Column Name | Type |
+---------------+---------+
| student_id | int |
| student_name | varchar |
+---------------+---------+
student_id is the primary key (column with unique values) for this table.
Each row of this table contains the ID and the name of one student in the school.
Table: Subjects
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| subject_name | varchar |
+--------------+---------+
subject_name is the primary key (column with unique values) for this table.
Each row of this table contains the name of one subject in the school.
Table: Examinations
+--------------+---------+
| Column Name | Type |
+--------------+---------+
| student_id | int |
| subject_name | varchar |
+--------------+---------+
There is no primary key (column with unique values) for this table. It may contain duplicates.
Each student from the Students table takes every course from the Subjects table.
Each row of this table indicates that a student with ID student_id attended the exam of subject_name.
Write a solution to find the number of times each student attended each exam.
Return the result table ordered by student_id and subject_name.
The result format is in the following example.
문제 풀이
MySQL
with temp as (
select s.student_id, s.student_name, sub.subject_name
from students s
cross join subjects sub
),
grouped as (
select student_id, subject_name, count(*) as attended_exams
from examinations
group by student_id, subject_name
)
select t.student_id, t.student_name, t.subject_name, coalesce(g.attended_exams, 0) as attended_exams
from temp t
left join grouped g on t.student_id = g.student_id and t.subject_name = g.subject_name
order by t.student_id, t.subject_name
- cross join을 한 테이블과 시험내역을 group by한 테이블을 left join한다.
- 학생 테이블은 모든 학생 정보를 담고 있고, 시험 테이블에서는 일부 학생만 있는데 결과 쿼리에서는 전체 학생을 원해서 cross join 이후, left join을 한다.
Pandas
import pandas as pd
def students_and_examinations(students: pd.DataFrame, subjects: pd.DataFrame, examinations: pd.DataFrame) -> pd.DataFrame:
temp = pd.MultiIndex.from_product([students['student_id'], subjects['subject_name']], names=['student_id', 'subject_name']).to_frame(index=False)
temp = temp.merge(students, on='student_id')
grouped = examinations.groupby(['student_id', 'subject_name']).size().reset_index(name='attended_exams')
answer = temp.merge(grouped, on=['student_id', 'subject_name'], how='left')
answer['attended_exams'].fillna(0, inplace = True)
answer.sort_values(by=['student_id', 'subject_name'], inplace=True)
return answer[['student_id','student_name','subject_name','attended_exams']]
- chatgpt한테 물어보니까 multiindex 써야한다는데, 그냥 how = 'cross'가 있더라...
코멘트
- cross join 했는데도 테이블 사이즈가 크지 않은거같아서 SQL기준 상위 15퍼정도 속도 나왔음.
댓글