pandas dataframe:基于公共列对多列值求和
pandas dataframe: sum multi-columns value based on common columns
有如下数据框:
date id device t1 t2 text y1 y2
2010-1-1 1 pc yes1 I am1 This is a test1 5 3
2010-1-1 1 smart yes1 I am1 This is a test1 6 4
2010-1-1 1 table yes1 I am1 This is a test1 7 5
2010-1-1 2 pc yes2 I am1 This is a test2 8 2
2010-1-1 2 smart yes2 I am1 This is a test2 8 3
2010-1-1 2 table yes2 I am1 This is a test2 9 4
2010-1-1 3 pc yes3 I am3 This is a test3 10 3
2010-1-1 3 smart yes3 I am3 This is a tes3 11 2
........................
现在我想合并一个新的数据框:
(1).当 id 和日期、t1、t2、文本相同时,对 y1 和 y2 求和。
(2).当 id 和 date、t1、t2、text 相同时加入设备 str。
(3).将公共行(具有相同的 id、日期、文本、t1、t2)合并为一行,
新数据框如下所示:
date id device t1 t2 text y1 y2
2010-1-1 1 pc,smart,table yes1 I am1 This is a test1 18 12
2010-1-1 2 pc,smart,table yes2 I am2 This is a test2 25 9
2010-1-1 3 pc,smart yes3 I am3 This is a test3 21 5
使用
In [294]: (df.groupby(['date', 'id', 't1', 't2', 'text'], as_index=False)
.agg({'device': ','.join, 'y1': sum, 'y2': sum}))
Out[294]:
date id t1 t2 text device y1 y2
0 2010-1-1 1 yes1 I am1 This is a test1 pc,smart,table 18 12
1 2010-1-1 2 yes2 I am1 This is a test2 pc,smart,table 25 9
2 2010-1-1 3 yes3 I am3 This is a test3 pc,smart 21 5
使用 groupby
by all columns with same values per groups and aggregate by agg
with dictionary, last add reindex
对最终列进行相同的排序:
df = (df.groupby(['date','id', 't1', 't2', 'text'], as_index=False)
.agg({'y1':'sum', 'y2':'sum', 'device': ', '.join})
.reindex(columns=df.columns))
print (df)
date id device t1 t2 text y1 y2
0 2010-1-1 1 pc, smart, table yes1 I am1 This is a test1 18 12
1 2010-1-1 2 pc, smart, table yes2 I am1 This is a test2 25 9
2 2010-1-1 3 pc, smart yes3 I am3 This is a test3 21 5
有如下数据框:
date id device t1 t2 text y1 y2
2010-1-1 1 pc yes1 I am1 This is a test1 5 3
2010-1-1 1 smart yes1 I am1 This is a test1 6 4
2010-1-1 1 table yes1 I am1 This is a test1 7 5
2010-1-1 2 pc yes2 I am1 This is a test2 8 2
2010-1-1 2 smart yes2 I am1 This is a test2 8 3
2010-1-1 2 table yes2 I am1 This is a test2 9 4
2010-1-1 3 pc yes3 I am3 This is a test3 10 3
2010-1-1 3 smart yes3 I am3 This is a tes3 11 2
........................
现在我想合并一个新的数据框:
(1).当 id 和日期、t1、t2、文本相同时,对 y1 和 y2 求和。
(2).当 id 和 date、t1、t2、text 相同时加入设备 str。
(3).将公共行(具有相同的 id、日期、文本、t1、t2)合并为一行,
新数据框如下所示:
date id device t1 t2 text y1 y2
2010-1-1 1 pc,smart,table yes1 I am1 This is a test1 18 12
2010-1-1 2 pc,smart,table yes2 I am2 This is a test2 25 9
2010-1-1 3 pc,smart yes3 I am3 This is a test3 21 5
使用
In [294]: (df.groupby(['date', 'id', 't1', 't2', 'text'], as_index=False)
.agg({'device': ','.join, 'y1': sum, 'y2': sum}))
Out[294]:
date id t1 t2 text device y1 y2
0 2010-1-1 1 yes1 I am1 This is a test1 pc,smart,table 18 12
1 2010-1-1 2 yes2 I am1 This is a test2 pc,smart,table 25 9
2 2010-1-1 3 yes3 I am3 This is a test3 pc,smart 21 5
使用 groupby
by all columns with same values per groups and aggregate by agg
with dictionary, last add reindex
对最终列进行相同的排序:
df = (df.groupby(['date','id', 't1', 't2', 'text'], as_index=False)
.agg({'y1':'sum', 'y2':'sum', 'device': ', '.join})
.reindex(columns=df.columns))
print (df)
date id device t1 t2 text y1 y2
0 2010-1-1 1 pc, smart, table yes1 I am1 This is a test1 18 12
1 2010-1-1 2 pc, smart, table yes2 I am1 This is a test2 25 9
2 2010-1-1 3 pc, smart yes3 I am3 This is a test3 21 5