Python 按多列分组并保留其他列
Python Group by multiple columns and keep other columns
我有一个 table 如下所示:
City_code City_name Site_code Site_capacity
AAA100 City_A Site001 300
AAA100 City_A Site002 600
AAA100 City_A Site003 500
AAA200 City_B Site004 350
AAA200 City_B Site005 250
AAA300 City_C Site006 800
AAA300 City_C Site007 150
AAA300 City_C Site008 450
AAA400 City_D Site009 300
AAA400 City_D Site0010 400
我想 select 每个城市 Site_capacity 具有最高价值的网站
我试过以下代码:
df.groupby(['City_code', 'City_name'])['Site_capacity'].max()
这是它生成的输出:
City_code City_name
AAA100 City_A 600
AAA200 City_B 350
AAA300 City_C 800
AAA400 City_D 400
如何创建类似这样的输出?
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
我们可以做到 sort_values
+ drop_duplicates
s = df.sort_values('Site_capacity').drop_duplicates(['City_code', 'City_name'],keep='last')
Out[334]:
City_code City_name Site_code Site_capacity
3 AAA200 City_B Site004 350
9 AAA400 City_D Site0010 400
1 AAA100 City_A Site002 600
5 AAA300 City_C Site006 800
尝试idxmax()
和.loc
print(df.loc[df.groupby(['City_code', 'City_name'])['Site_capacity'].idxmax()])
City_code City_name Site_code Site_capacity
1 AAA100 City_A Site002 600
3 AAA200 City_B Site004 350
5 AAA300 City_C Site006 800
9 AAA400 City_D Site0010 400
试试这个:
df.sort_values(by=['City_name','Site_capacity'], inplace=True,ascending = (True, False))
df = df.drop_duplicates('City_name', keep='first')
print(df)
结果:
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
或者如果你想保持最低值。
df = df.drop_duplicates('City_name', keep='last')
我有一个 table 如下所示:
City_code City_name Site_code Site_capacity
AAA100 City_A Site001 300
AAA100 City_A Site002 600
AAA100 City_A Site003 500
AAA200 City_B Site004 350
AAA200 City_B Site005 250
AAA300 City_C Site006 800
AAA300 City_C Site007 150
AAA300 City_C Site008 450
AAA400 City_D Site009 300
AAA400 City_D Site0010 400
我想 select 每个城市 Site_capacity 具有最高价值的网站
我试过以下代码:
df.groupby(['City_code', 'City_name'])['Site_capacity'].max()
这是它生成的输出:
City_code City_name
AAA100 City_A 600
AAA200 City_B 350
AAA300 City_C 800
AAA400 City_D 400
如何创建类似这样的输出?
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
我们可以做到 sort_values
+ drop_duplicates
s = df.sort_values('Site_capacity').drop_duplicates(['City_code', 'City_name'],keep='last')
Out[334]:
City_code City_name Site_code Site_capacity
3 AAA200 City_B Site004 350
9 AAA400 City_D Site0010 400
1 AAA100 City_A Site002 600
5 AAA300 City_C Site006 800
尝试idxmax()
和.loc
print(df.loc[df.groupby(['City_code', 'City_name'])['Site_capacity'].idxmax()])
City_code City_name Site_code Site_capacity
1 AAA100 City_A Site002 600
3 AAA200 City_B Site004 350
5 AAA300 City_C Site006 800
9 AAA400 City_D Site0010 400
试试这个:
df.sort_values(by=['City_name','Site_capacity'], inplace=True,ascending = (True, False))
df = df.drop_duplicates('City_name', keep='first')
print(df)
结果:
City_code City_name Site_code Site_capacity
AAA100 City_A Site002 600
AAA200 City_B Site004 350
AAA300 City_C Site006 800
AAA400 City_D Site0010 400
或者如果你想保持最低值。
df = df.drop_duplicates('City_name', keep='last')