리트코드 : 1907. Count Salary Categories
문제
Table: Accounts
+-------------+------+
| Column Name | Type |
+-------------+------+
| account_id | int |
| income | int |
+-------------+------+
account_id is the primary key (column with unique values) for this table.
Each row contains information about the monthly income for one bank account.
Write a solution to calculate the number of bank accounts for each salary category. The salary categories are:
"Low Salary": All the salaries strictly less than $20000.
"Average Salary": All the salaries in the inclusive range [$20000, $50000].
"High Salary": All the salaries strictly greater than $50000.
The result table must contain all three categories. If there are no accounts in a category, return 0.
Return the result table in any order.
The result format is in the following example.
Example 1:
Input:
Accounts table:
+------------+--------+
| account_id | income |
+------------+--------+
| 3 | 108939 |
| 2 | 12747 |
| 8 | 87709 |
| 6 | 91796 |
+------------+--------+
Output:
+----------------+----------------+
| category | accounts_count |
+----------------+----------------+
| Low Salary | 1 |
| Average Salary | 0 |
| High Salary | 3 |
+----------------+----------------+
Explanation:
Low Salary: Account 2.
Average Salary: No accounts.
High Salary: Accounts 3, 6, and 8.
문제 풀이
MySQL
WITH TEMP2 AS (
SELECT *,
CASE
WHEN INCOME > 50000 THEN 'High Salary'
WHEN INCOME < 20000 THEN 'Low Salary'
ELSE 'Average Salary'
END AS SALARY_CATEGORY
FROM (
SELECT * FROM ACCOUNTS
UNION ALL
SELECT 0,0
UNION ALL
SELECT 0, 20000
UNION ALL
SELECT 0, 50001
) AS TEMP1
)
SELECT SALARY_CATEGORY AS 'category', COUNT(*)-1 AS 'accounts_count'
FROM TEMP2
GROUP BY SALARY_CATEGORY
- 소둑 구간 표본이 없는 경우 GROUP BY에 없어서 더미 데이터를 추가한다.
Pandas
import pandas as pd
def count_salary_categories(accounts: pd.DataFrame) -> pd.DataFrame:
accounts = pd.concat([accounts,
pd.DataFrame({'account_id': [0,0,0], 'income': [0,20000,50001]})],
ignore_index=True)
accounts['category'] = pd.cut(accounts['income'],
bins=[-np.inf, 20000, 50001, np.inf],
labels=['Low Salary', 'Average Salary', 'High Salary'], right=False)
grouped = accounts.groupby('category').agg(
accounts_count=('category', lambda x: x.size-1)
).reset_index()
return grouped
코멘트
- 더미데이터고 뭐고 그냥 컬럼 설정 미리 할당해놓고 count로 하는게 제일 빠른 듯
댓글