根据索引匹配选择性地使用df.div()只划分某一列

Question

我有 2 个 DataFrame，一个是每月总计，另一个包含我想除以第一个以获得每月百分比贡献的值。

以下是一些示例数据帧：

MonthlyTotals = pd.DataFrame(data={'Month':[1,2,3],'Value':[100,200,300]})

Data = pd.DataFrame(data={'ID':[1,2,3,1,2,3,1,2,3],
                          'Month':[1,1,1,2,2,2,3,3,3],
                          'Value':[40,30,30,60,70,70,150,60,90]})

我正在使用 df.div() 所以我设置索引是这样的

MonthlyTotals.set_index('Month', inplace=True)
Data.set_index('Month', inplace=True)

然后我做除法

Contributions = Data.div(MonthlyTotals, axis='index')

生成的 DataFrame 是我想要的，但我看不到 Value 相关的 ID，因为它不在 MonthlyTotals 框架中。我将如何使用 df.div() 但仅在某些列上有选择地使用？

这是我正在寻找的结果的示例数据框

result = pd.DataFrame(data={'ID':[1,2,3,1,2,3,1,2,3],'Value':[0.4,0.3,0.3,0.3,0.35,0.35,0.5,0.2,0.3]})

Answer 1

此外，如果您只想使用 pandas，您可以使用 reindex + update

修复您的代码

Data.update(Data['Value'].div(MonthlyTotals['Value'].reindex(Data.index),axis=0))
Data
       ID  Value
Month           
1       1   0.40
1       2   0.30
1       3   0.30
2       1   0.30
2       2   0.35
2       3   0.35
3       1   0.50
3       2   0.20
3       3   0.30

Answer 2

如果数据完整，您可能不需要 MonthlyTotals。您可以使用 transform 计算 MonthlyTotal，然后计算 Contributions。

Data = pd.DataFrame(data={'ID':[1,2,3,1,2,3,1,2,3],
                          'Month':[1,1,1,2,2,2,3,3,3],
                          'Value':[40,30,30,60,70,70,150,60,90]})
Data['MonthlyTotal'] = Data.Gropuby('Month')['Value'].transform('sum')
Data['Contributions'] = Data['Value'] / Data['MonthlyTotal']

输出

   ID  Month  Value  MonthlyTotal  Contributions
0   1      1     40           100           0.40
1   2      1     30           100           0.30
2   3      1     30           100           0.30
3   1      2     60           200           0.30
4   2      2     70           200           0.35
5   3      2     70           200           0.35
6   1      3    150           300           0.50
7   2      3     60           300           0.20
8   3      3     90           300           0.30

根据索引匹配选择性地使用df.div()只划分某一列

Selectively use df.div() to divide only a certain column based on index match

python

division

pandas