按函数分组的字符串模式聚合

Question

我有如下所示的数据框

Country  City
UK       London
USA      Washington
UK       London
UK       Manchester
USA      Washington
USA      Chicago

我想对国家/地区进行分组并聚合到一个国家/地区中重复次数最多的城市

我想要的输出应该像

Country City
UK      London
USA     Washington

因为伦敦和华盛顿出现了 2 次，而曼彻斯特和芝加哥只出现了 1 次。

我试过了

from scipy.stats import mode
df_summary = df.groupby('Country')['City'].\
                        apply(lambda x: mode(x)[0][0]).reset_index()

但它似乎不适用于字符串

Answer 1

我无法复制您的错误，但您可以使用 pd.Series.mode, which accepts strings and returns a series, using iat 提取第一个值：

res = df.groupby('Country')['City'].apply(lambda x: x.mode().iat[0]).reset_index()

print(res)

  Country        City
0      UK      London
1     USA  Washington

Answer 2

尝试如下：

>>> df.City.mode()
0        London
1    Washington
dtype: object

或

import pandas as pd
from scipy import stats

可以将 scipy 与 stats + lambda 一起使用：

df.groupby('Country').agg({'City': lambda x:stats.mode(x)[0]})
               City
Country
UK           London
USA      Washington

#  df.groupby('Country').agg({'City': lambda x:stats.mode(x)[0]}).reset_index()

但是，如果您不想 return 任何第一个值：

，它也会给出很好的计数

>>> df.groupby('Country').agg({'City': lambda x:stats.mode(x)})
                        City
Country
UK           ([London], [2])
USA      ([Washington], [2])

按函数分组的字符串模式聚合

String mode aggregation with group by function

python

aggregate

mode

pandas

pandas-groupby