pandas 带字符串的 groupby - return 单个字符串

Question

我有一个如下所示的数据框：

    ID  Type    Size
0   123 Red     5
1   456 Blue    7
2   789 Yellow  12
3   789 Yellow  4

我现在想按 ID 进行聚合，并对重复项取 size 的平均值。但是，我只希望 return 与 Type 相同的字符串，而不是将其连接起来。我试图使用 agg:

来捕获它

df = pd.DataFrame({'ID' : [123, 456, 789, 789], 'Type' : ['Red', 'Blue', 'Yellow', 'Yellow'], 'Size' : [5, 7, 12, 4]})

def identity(x):
    return x

special_columns = ['Type']
aggfuncs = {col: statistics.mean for col in df.columns}
aggfuncs.update({col:identity for col in special_columns})
df.groupby(['ID'], as_index=False).agg(aggfuncs)

然而，这仍然变成了一个重复字符串的数组：

    ID  Type              Size
0   123 Red                 5
1   456 Blue                7
2   789 [Yellow, Yellow]    8

我想要的最终结果是：

    ID  Type              Size
0   123 Red                 5
1   456 Blue                7
2   789 Yellow              8

如何实现？

Answer 1

使用 first 函数作为聚合器：

>>> df.groupby('ID').agg({'Type': 'first', 'Size': 'mean'})

    ID    Type  Size
0  123     Red   5.0
1  456    Blue   7.0
2  789  Yellow   8.0

Answer 2

如果每个 ID 都有一个对应的类型，这应该可行

# use both ID and Type as grouper
res = df.groupby(["ID", "Type"], as_index=False)["Size"].mean()
res

pandas 带字符串的 groupby - return 单个字符串

pandas groupby with strings - return single string

python

dataframe

pandas