리트코드 : 182. Duplicate Emails

https://leetcode.com/problems/duplicate-emails/description/

문제

Table: Person

+-------------+---------+
| Column Name | Type    |
+-------------+---------+
| id          | int     |
| email       | varchar |
+-------------+---------+
id is the primary key (column with unique values) for this table.
Each row of this table contains an email. The emails will not contain uppercase letters.


Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.

Return the result table in any order.

The result format is in the following example.



Example 1:

Input: 
Person table:
+----+---------+
| id | email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+
Output: 
+---------+
| Email   |
+---------+
| a@b.com |
+---------+
Explanation: a@b.com is repeated two times.

문제 풀이

MySQL

SELECT EMAIL
FROM PERSON
GROUP BY EMAIL
HAVING COUNT(EMAIL) > 1

EAMIL 컬럼에서 중복 값이 있는 경우를 찾아야한다.
GROUP BY + HAVING으로 카운팅해주기.

Pandas

import pandas as pd

def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
    duplicated_emails = person.groupby('email').filter(lambda x: len(x) > 1)
    return pd.DataFrame({'Email':duplicated_emails['email'].unique()})

def duplicate_emails(person: pd.DataFrame) -> pd.DataFrame:
    duplicated_email = person.duplicated('email',keep='first')
    return pd.DataFrame({'Email':person[duplicated_email]['email'].unique()})

filter를 통해서 개수를 조건을 걸어주기.
중복을 duplicated의 boolean indexing을 통해서 제거하는 경우 해싱을 통해서 한 번의 순회로만 가능해서 더 빠르다.
count나 size같이 개수를 세는 방법도 있다.

코멘트

리트코드에 동작 시간이랑 예시 코드 제공해주는게 도움 많이 되는듯

'Data Analysis > Query' 카테고리의 다른 글

리트코드 : 262. Trips and Users (0)	2024.04.03
리트코드 : 197. Rising Temperature (0)	2024.04.01
리트코드 : 183. Customers Who Never Order (0)	2024.04.01
리트코드 : 175. Combine Two Tables (0)	2024.03.27
리트코드 : 181. Employees Earning More Than Their Managers (0)	2024.03.27
리트코드 : 196. Delete Duplicate Emails (0)	2024.03.26
185. Department Top Three Salaries (0)	2024.03.26

베짱이의 작업공간

리트코드 : 182. Duplicate Emails

리트코드 : 182. Duplicate Emails

문제

문제 풀이

MySQL

Pandas

코멘트

'Data Analysis > Query' 카테고리의 다른 글

댓글

티스토리툴바

리트코드 : 182. Duplicate Emails

리트코드 : 182. Duplicate Emails

문제

문제 풀이

MySQL

Pandas

코멘트

'Data Analysis > Query' 카테고리의 다른 글

관련글

댓글

티스토리툴바