如何在 GroupBy 和连续日期条件下对值求和？

Question

给定 table:

ID	LINE	SITE	DATE	UNITS	TOTAL
1	X	AAA	02-May-2017	12	30
2	X	AAA	03-May-2017	10	22
3	X	AAA	04-May-2017	22	40
4	Z	AAA	20-MAY-2017	15	44
5	Z	AAA	21-May-2017	8	30
6	Z	BBB	22-May-2017	10	32
7	Z	BBB	23-May-2017	25	52
8	K	CCC	02-Jun-2017	6	22
9	K	CCC	03-Jun-2017	4	33
10	K	CCC	12-Aug-2017	11	44
11	K	CCC	13-Aug-2017	19	40
12	K	CCC	14-Aug-2017	30	40

对于每一行，如果 ID、LINE、SITE 等于前一行（天）需要计算如下（最后一天）和（最后 3 天）：请注意，需要确保日期在 ID、LINE、SITE 列的“groupby”下是连续的

ID	LINE	SITE	DATE	UNITS	TOTAL	Last day	Last 3 days
1	X	AAA	02-May-2017	12	30	0	0
2	X	AAA	03-May-2017	10	22	12/30	12/30
3	X	AAA	04-May-2017	22	40	10/22	(10+12)/(30+22)
4	Z	AAA	20-MAY-2017	15	44	0	0
5	Z	AAA	21-May-2017	8	30	15/44	15/44
6	Z	BBB	22-May-2017	10	32	0	0
7	Z	BBB	23-May-2017	25	52	10/32	10/32
8	K	CCC	02-Jun-2017	6	22	0	0
9	K	CCC	03-Jun-2017	4	33	6/22	6/22
10	K	CCC	12-Aug-2017	11	44	4/33	0
11	K	CCC	13-Aug-2017	19	40	11/44	(11/44)
12	K	CCC	14-Aug-2017	30	40	19/40	(11+19/44+40)

Answer 1

在这种情况下，我通常使用 groupby 进行 for 循环：

import pandas as pd
import numpy as np

#copied your table
table = pd.read_csv('/home/fm/Desktop/stackover.csv')
table.set_index('ID', inplace = True)
table[['Last day','Last 3 days']] = np.nan

for i,r in table.groupby(['LINE' ,'SITE']):
    #First subset non sequential dates
    limits_interval = pd.to_datetime(r['DATE']).diff() != '1 days'
    #First element is a false positive, as its impossible to calculate past days from first day
    limits_interval.iloc[0]=False

    ids_subset = r.index[limits_interval].to_list()
    ids_subset.append(r.index[-1]+1) #to consider all values
    id_start = 0

    for id_end in ids_subset:    
        r_sub = r.loc[id_start:id_end-1, :].copy()
        id_start = id_end 

        #move all values one day off, if the database is as in your example (1 line per day) wont have problems
        r_shifted = r_sub.shift(1)

        r_sub['Last day']=r_shifted['UNITS']/r_shifted['TOTAL']

        aux_units_cumsum = r_shifted['UNITS'].cumsum()
        aux_total_cumsum = r_shifted['TOTAL'].cumsum()

        r_sub['Last 3 days'] = aux_units_cumsum/aux_total_cumsum

        r_sub.fillna(0, inplace = True)

        table.loc[r_sub.index,:]=r_sub.copy()

你可以做一个函数在groupby中应用，这样会更干净：Apply function to pandas groupby。它会更优雅。希望能帮到你，祝你好运

如何在 GroupBy 和连续日期条件下对值求和？

How to sum values under GroupBy and consecutive date conditions?

date

shift

pandas

cumsum

pandas-groupby