具有不同聚合的不同列的 Groupby 以及下一个日期的 cumsum

Question

我有一个按日期和时间排序的数据框：

 ID     Date     Time      A         B      C
 abc   06/Feb     11       12        12     10 
 abc   06/Feb     12       14        13     5
 xyz   07/Feb      1       16        14     50
 xyz   07/Feb      2       18        15     0
 xyz   07/Feb      3       20        16     10

我想按 ID 和日期对它进行分组，并将总和作为分子，算作分母，但对于下一个日期，总和将是先前日期的总和，因此将作为 cumcount 计数，最后还有 3 列A、B、C 列的值将 added.Such 为：

ID    Date     A_Num  A_denom   B_Num   B_Denom  C_Num   C_Denom  A_Last  B_Last  C_Last
abc   06/Feb    26       2        25       2      15        2       14      13      5
xyz   07/Feb    54       3        45       3      60        3       20      16      10

我无法一次完成所有这些..任何人都可以提前 this.Thanks 帮助我。

现在我想在 df1 acc 中添加 df2 到 id 为：

ID    Date     A_Num  A_denom   B_Num   B_Denom  C_Num   C_Denom  A_Last  B_Last  C_Last
abc   06/Feb    52       4        50       4      30        4       14      13      5
xyz   07/Feb    108      6        90       6      120       6       20      16      10

Answer 1

您可以在 GroupBy.agg, then selecting num and denum and use cumulative sum and last add by concat another DataFrame created by aggregate GroupBy.last 中按组聚合 sum、size 和 last：

cols = ['A','B','C']

print (df[cols].dtypes)
A    int64
B    int64
C    int64
dtype: object

d = {'sum':'Num','size':'denom'}
df1 = df.groupby(['ID','Date'])[cols].agg(['sum','size']).rename(columns=d).cumsum()
df1.columns = df1.columns.map(lambda x: f'{x[0]}_{x[1]}')

df2 = df.groupby(['ID','Date'])[cols].last().add_suffix('_Last')
df3 = pd.concat([df1, df2], axis=1).reset_index()

print (df3)
    ID    Date  A_Num  A_denom  B_Num  B_denom  C_Num  C_denom  A_Last  \
0  abc  06/Feb     26        2     25        2     15        2      14   
1  xyz  07/Feb     80        5     70        5     75        5      20   

   B_Last  C_Last  
0      13       5  
1      16      10

不使用索引写入文件：

df3.to_csv('file', index=False)

如果解决方案中没有.reset_index：

df3.to_csv('file')

具有不同聚合的不同列的 Groupby 以及下一个日期的 cumsum

Groupby of different columns with different aggreagation with cumsum for next date

python

pandas

cumsum

pandas-groupby