数据操作
- 1. 相关知识点
- 1.12 分组与连表
- 1.13 排名
- 2. 题目
- 2.10 第N高的薪水
- 2.11 第二高的薪水
- 2.12 部门工资最高的员工
- 2.13 分数排名
- 2.14 删除重复的电子邮箱
- 2.15 每个产品在不同商店的价格
1. 相关知识点
1.12 分组与连表
- 分组
max_salary=employee.groupby('departmentId')['salary'].max().reset_index()
- 连表
data=pd.merge(employee,department,left_on='departmentId',right_on='id')
1.13 排名
dense
相同值的项将获得连续排名ascending
指定排名的顺序,默认值为 True,升序scores['rank']=scores['score'].rank(method = 'dense',ascending = False)
2. 题目
2.10 第N高的薪水
import pandas as pd
def nth_highest_salary(employee: pd.DataFrame, N: int) -> pd.DataFrame:
if len(employee)<N:
return pd.DataFrame({'getNthHighestSalary(2)':[None]})
employee.sort_values('salary',ascending=False,inplace=True)
employee=employee.rename(columns={'salary':'getNthHighestSalary(2)'})
return employee[['getNthHighestSalary(2)']].head(N).tail(1)
2.11 第二高的薪水
import pandas as pd
def second_highest_salary(employee: pd.DataFrame) -> pd.DataFrame:
if len(employee)<2:
return pd.DataFrame({'SecondHighestSalary':[None]})
employee.sort_values('salary',ascending=False,inplace=True)
employee=employee.rename(columns={'salary':'SecondHighestSalary'})
return employee[['SecondHighestSalary']].head(2).tail(1)
2.12 部门工资最高的员工
import pandas as pd
def department_highest_salary(employee: pd.DataFrame, department: pd.DataFrame) -> pd.DataFrame:
max_salary=employee.groupby('departmentId')['salary'].max().reset_index()
max_list=max_salary['salary'].to_list()
employee=employee.query(f'`salary` in {max_list}')
data=pd.merge(employee,department,left_on='departmentId',right_on='id')
data=data.rename(columns={'name_y':'Department','name_x':'Employee'})
return data[['Department','Employee','salary']]
2.13 分数排名
import pandas as pd
def order_scores(scores: pd.DataFrame) -> pd.DataFrame:
scores['rank']=scores['score'].rank(method = 'dense',ascending = False)
return scores.sort_values('rank')[['score','rank']]
2.14 删除重复的电子邮箱
import pandas as pd
def delete_duplicate_emails(person: pd.DataFrame) -> None:
person.sort_values('id',inplace=True)
person.drop_duplicates(subset=['email'],keep='first',inplace=True)
2.15 每个产品在不同商店的价格
import pandas as pd
def rearrange_products_table(products: pd.DataFrame) -> pd.DataFrame:
data=pd.melt(products,id_vars='product_id',var_name='store',value_name='price')
# axis=0代表行
data=data.dropna(subset=['price'],how='any', axis=0,inplace = False)
return data