使用 groupby 函数计算值并同时使用 apply 函数

Question

我正在尝试计算分组值的出现次数，并在数据帧上使用 apply 和 grouby 函数将值写入列中。我有以下数据框：

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

我有两个语句给出了正确的结果，但我需要组合的结果。一个是：

df_new = df.groupby("colA").count()

这给出了

colA
name1    1
name2    3
name4    1
name5    2

另一个是

df_new = df.groupby("colA")["colB"].apply(lambda lists: ','.join(color)).reset_index(name='Color')

并给予

    colA                Color
0  name1                   red
1  name2  yellow,yellow,yellow
2  name4                 black
3  name5            green,blue

我需要的是组合看起来像

    colA                Color      Count grouped A
0  name1                   red     1
1  name2  yellow,yellow,yellow     3
2  name4                 black     1
3  name5            green,blue     2

尝试以多种方式组合，当然也进行了研究，但我无法做到。

Answer 1

您可以将 first 连接到 second 作为新列并使用 colA 在正确的位置分配值。

df_new = df_2.join(df_1, on='colA')

还需要df_1.rename(columns={'colB': 'Count grouped A'})

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

df_1 = df.groupby("colA").count().rename(columns={'colB': 'Count grouped A'})

df_2 = df.groupby("colA")["colB"].apply(lambda lists: ','.join(lists)).reset_index(name='Color')

df_new = df_2.join(df_1, on='colA')

print(df_new)

编辑：

同上，略有改动

第一次groups = df.groupby("colA")，后来两次groups...
.apply(','.join) 而不是 .apply(lambda lists: ','.join(lists))

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

groups = df.groupby("colA")

df_1 = groups.count().rename(columns={'colB': 'Count grouped A'})
df_2 = groups["colB"].apply(','.join).reset_index(name='Color')

df_new = df_2.join(df_1, on='colA')

print(df_new)

编辑：

如果您将 Color 保留为 list 那么它可能会更简单。

您可以使用 .str.len() 来计算 list

中的元素

.str 表明它具有字符串函数，但其中一些函数也适用于 list（即 .str[1:4]）甚至 dictionary（即 .str[key])

import pandas as pd

df = pd.DataFrame({'colA': ['name1', 'name2', 'name2', 'name4', 'name2', 'name5', 'name5'], 'colB': ['red', 'yellow', 'yellow', 'black', 'yellow', 'green', 'blue']})

df_new = df.groupby("colA")["colB"].apply(list).reset_index(name='Color')
df_new['Count grouped A'] = df_new['Color'].str.len()

print(df_new)

使用 groupby 函数计算值并同时使用 apply 函数

Count values using groupby function and using apply function at the same time

python

dataframe

pandas-apply

pandas-groupby