python pandas。基于字符串列在另一列的逗号分隔值中的匹配的列总和

Question

我有 2 个数据框样本，名称已更改：

df1 =

Comp_code	DepartmentListA	DepartmentListB
Code_1	"Dept1"	"Dept3"
Code_2	"Dept2"	"Dept4"
Code_3	"Dept4, Dept5"	"Dept1"
Code_4	"Dept1,Dept5, Dept6"	"Dept3, Dept4"

df2 = 只有部门和收入

DepartmentList	Revenue	Gross Margin
"Dept1"	1000	500
"Dept2"	2000	0
"Dept3,	5000	900
"Dept4"	5000	200
"Dept5"	7000	-100
"Dept6"	8000	2500

我希望我的最终 df 包含公司代码以及总收入和毛利率。汇总 A 列和 B 列中的部门总数。 由于以逗号分隔的部门字符串，我无法迭代和加入。我最后的DF应该是这样的

预期 df =

Comp_code	GrossRev	Tot Margin
Code_1	6000	1400
Code_2	7000	200
Code_3	13000	600
Code_4	26000	4000

数据框也有几百万行，一些部门列表（逗号分隔值）大约有 100 个。如果有一种高效的方法来做到这一点，那就太好了。

Answer 1

此代码有效。很长，但大部分都是重复的。

new_df = df1[['Comp_code']].copy()
new_df['GrossRev'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum() + df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['Revenue']).groupby(level=0).sum()
new_df['Tot Margin'] = df1['DepartmentListB'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum() + df1['DepartmentListA'].str.split(',').explode().map(df2.set_index('DepartmentList')['GrossMargin']).groupby(level=0).sum()

输出：

>>> new_df
  Comp_code  GrossRev  Tot Margin
0    Code_1      6000        1400
1    Code_2      7000         200
2    Code_3     13000         600
3    Code_4     26000        4000

python pandas。基于字符串列在另一列的逗号分隔值中的匹配的列总和

python pandas. Sum of a column based on a match of string column in a comma separated values of another column

python-2.7

pandas

pandas-groupby