Pandas 分组并应用

Question

我正在执行 grouby 并应用于返回一些奇怪结果的数据框，我正在使用 pandas 1.3.1

代码如下：

ddf = pd.DataFrame({
    "id": [1,1,1,1,2]
})

def do_something(df):
    return "x"

ddf["title"] = ddf.groupby("id").apply(do_something)
ddf

我希望 title 列中的每一行都被赋予值“x”，但是当这种情况发生时，我得到了这个数据：

        id title
0        1   NaN
1        1     x
2        1     x
3        1   NaN
4        2   NaN

这是预期的吗？

Answer 1

结果并不奇怪，这是正确的行为：apply returns 组的值，这里的 1 和 2 成为聚合的索引：

>>> list(ddf.groupby("id"))
[(1,        # the group name (the future index of the grouped df)
     id     # the subset dataframe of the group 2
  0   1
  1   1
  2   1
  3   1),
 (2,        # the group name (the future index of the grouped df)
     id     # the subset dataframe of the group 2
  4   2)]

为什么我有结果？因为该组的标签与您的数据帧索引相同：

>>> ddf.groupby("id").apply(do_something)
id
1    x
2    x
dtype: object

现在像这样更改 id：

ddf['id'] += 10
#    id
# 0  11
# 1  11
# 2  11
# 3  11
# 4  12

ddf["title"] = ddf.groupby("id").apply(do_something)
#    id title
# 0  11   NaN
# 1  11   NaN
# 2  11   NaN
# 3  11   NaN
# 4  12   NaN

或更改 index:

ddf.index += 10
#    id
# 10  1
# 11  1
# 12  1
# 13  1
# 14  2

ddf["title"] = ddf.groupby("id").apply(do_something)
#     id title
# 10   1   NaN
# 11   1   NaN
# 12   1   NaN
# 13   1   NaN
# 14   2   NaN

Answer 2

是的，这是预期的。

首先，apply(do_something) 部分工作得很好，正是之前的 groupby 导致了问题。 A Groupby returns a groupby object，这与普通数据帧有点不同。如果你调试和检查 groupby returns 的内容，那么你会发现你需要某种形式的汇总函数来使用它（平均最大值或总和）。如果你运行其中一个作为示例，如下所示：

df = ddf.groupby("id")
df.mean()

它导致了这个结果：

Empty DataFrame
Columns: []
Index: [1, 2]

之后 do_something 仅应用于索引 1 和 2；然后集成到你原来的 df.这就是为什么只有索引 1 和 2 带有 x。现在我建议不要使用 groupby，因为不清楚为什么要在这里使用它。并深入了解 groupby object

Answer 3

聚合函数中如果需要新的列使用GroupBy.transform，需要在groupby之后指定列用于处理，这里id:

ddf["title"] = ddf.groupby("id")['id'].transform(do_something)

或在函数中分配新列：

def do_something(x):
    x['title'] = 'x'
    return x

ddf = ddf.groupby("id").apply(do_something)

解释为什么不在另一个答案中使用 gis。

Pandas 分组并应用

Pandas Groupby and Apply

dataframe

pandas

pandas-apply

pandas-groupby