Pandas 通过条件检查将多列数据框从长改造成宽

Question

我有一个pandas数据框如下：

id     group    type    action    cost
101    A        1                 10
101    A        1       repair    3
102    B        1                 5
102    B        1       repair    7
102    B        1       grease    2
102    B        1       inflate   1
103    A        2                 12
104    B        2                 9

我需要把它从长改成宽，但是取决于action列的值，如下：

id     group    type    action_std    action_extra
101    A        1       10            3
102    B        1       5             10
103    A        2       12            0
104    B        2       9             0

换句话说，对于 action 字段为空的行， cost 值应放在 action_std 列下，而对于非空 [=14] 字段的行=] 字段 cost 值应汇总在 action_extra 列下。

我尝试了 groupby / agg / pivot 的几种组合，但我找不到任何完全有效的解决方案...

Answer 1

我建议您将 cost 列拆分为 cost 和 cost_extra 列。类似于以下内容：

import numpy as np

result = df.assign(
    cost_extra=lambda df: np.where(
        df['action'].notnull(), df['cost'], np.nan
    )
).assign(
    cost=lambda df: np.where(
        df['action'].isnull(), df['cost'], np.nan
    )
).groupby(
    ["id", "group", "type"]
)["cost", "cost_extra"].agg(
    "sum"
)

result 看起来像：

                cost  cost_extra
id  group type                  
101 A     1     10.0         3.0
102 B     1      5.0        10.0
103 A     2     12.0         0.0
104 B     2      9.0         0.0

Answer 2

检查 groupby 与 unstack

df.cost.groupby([df.id,df.group,df.type,df.action.eq('')]).sum().unstack(fill_value=0)
action          False  True 
id  group type              
101 A     1         3     10
102 B     1        10      5
103 A     2         0     12
104 B     2         0      9

Answer 3

感谢您的提示，这是我最终最喜欢的解决方案（也是因为它的简单性）：

df["action_std"] = df["cost"].where(df["action"] == "")
df["action_extra"] = df["cost"].where(df["action"] != "")
df = df.groupby(["id", "group", "type"])["action_std", "action_extra"].sum().reset_index()

Pandas 通过条件检查将多列数据框从长改造成宽

Pandas reshape a multicolumn dataframe long to wide with conditional check

python

reshape

pandas