基于 pandas groupby 的多列求和

Sum based on multiple columns with pandas groupby

我想创建一个新列,根据多列的分组汇总 value 列。在此示例中,我想获得每个 ISINdateportfolio.[=14= 的总和]

df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
                     "date": ["16", "16", "18", "18", "22"],
                     "portfolio": ["A", "A", "B", "A", "D"],
                     "value": [400, 300, 200, 600, 500]})

这是所需的输出。如您所见,只有前两行“满足”条件,并且两行都得到 700 的总和。其他的将保持各自的价值。

df = pd.DataFrame({"ISIN": ["IS123", "IS123", "UN123", "UN123", "FA123"],
                     "date": ["16", "16", "18", "18", "22"],
                     "portfolio": ["A", "A", "B", "A", "D"],
                     "value": [400, 300, 200, 600, 500],
                     "Sum per ISIN, date and portfolio": [700, 700, 200, 600, 500]})

这是我尝试过的方法,但我只能在一列上进行分组,例如 ISIN.

df["Sum per ISIN, date and portfolio"] = df["value"].groupby(df["ISIN", "date", "portfolio"]).transform("sum")

在 DataFrame 上尝试 groupby 而不是系列 (value),然后 select 来自石斑鱼的列:

df["Sum per ISIN, date and portfolio"] = (
    df.groupby(["ISIN", "date", "portfolio"])["value"].transform("sum")
)
    ISIN date portfolio  value  Sum per ISIN, date and portfolio
0  IS123   16         A    400                               700
1  IS123   16         A    300                               700
2  UN123   18         B    200                               200
3  UN123   18         A    600                               600
4  FA123   22         D    500                               500