如何在 GroupBy 和连续日期条件下对值求和?

How to sum values under GroupBy and consecutive date conditions?

给定 table:

ID LINE SITE DATE UNITS TOTAL
1 X AAA 02-May-2017 12 30
2 X AAA 03-May-2017 10 22
3 X AAA 04-May-2017 22 40
4 Z AAA 20-MAY-2017 15 44
5 Z AAA 21-May-2017 8 30
6 Z BBB 22-May-2017 10 32
7 Z BBB 23-May-2017 25 52
8 K CCC 02-Jun-2017 6 22
9 K CCC 03-Jun-2017 4 33
10 K CCC 12-Aug-2017 11 44
11 K CCC 13-Aug-2017 19 40
12 K CCC 14-Aug-2017 30 40

对于每一行,如果 ID、LINE、SITE 等于前一行(天)需要计算如下(最后一天)和(最后 3 天): 请注意,需要确保日期在 ID、LINE、SITE 列的“groupby”下是连续的

ID LINE SITE DATE UNITS TOTAL Last day Last 3 days
1 X AAA 02-May-2017 12 30 0 0
2 X AAA 03-May-2017 10 22 12/30 12/30
3 X AAA 04-May-2017 22 40 10/22 (10+12)/(30+22)
4 Z AAA 20-MAY-2017 15 44 0 0
5 Z AAA 21-May-2017 8 30 15/44 15/44
6 Z BBB 22-May-2017 10 32 0 0
7 Z BBB 23-May-2017 25 52 10/32 10/32
8 K CCC 02-Jun-2017 6 22 0 0
9 K CCC 03-Jun-2017 4 33 6/22 6/22
10 K CCC 12-Aug-2017 11 44 4/33 0
11 K CCC 13-Aug-2017 19 40 11/44 (11/44)
12 K CCC 14-Aug-2017 30 40 19/40 (11+19/44+40)

在这种情况下,我通常使用 groupby 进行 for 循环:

import pandas as pd
import numpy as np

#copied your table
table = pd.read_csv('/home/fm/Desktop/stackover.csv')
table.set_index('ID', inplace = True)
table[['Last day','Last 3 days']] = np.nan

for i,r in table.groupby(['LINE' ,'SITE']):
    #First subset non sequential dates
    limits_interval = pd.to_datetime(r['DATE']).diff() != '1 days'
    #First element is a false positive, as its impossible to calculate past days from first day
    limits_interval.iloc[0]=False

    ids_subset = r.index[limits_interval].to_list()
    ids_subset.append(r.index[-1]+1) #to consider all values
    id_start = 0

    for id_end in ids_subset:    
        r_sub = r.loc[id_start:id_end-1, :].copy()
        id_start = id_end 

        #move all values one day off, if the database is as in your example (1 line per day) wont have problems
        r_shifted = r_sub.shift(1)

        r_sub['Last day']=r_shifted['UNITS']/r_shifted['TOTAL']

        aux_units_cumsum = r_shifted['UNITS'].cumsum()
        aux_total_cumsum = r_shifted['TOTAL'].cumsum()

        r_sub['Last 3 days'] = aux_units_cumsum/aux_total_cumsum

        r_sub.fillna(0, inplace = True)

        table.loc[r_sub.index,:]=r_sub.copy()

你可以做一个函数在groupby中应用,这样会更干净:Apply function to pandas groupby。它会更优雅。 希望能帮到你,祝你好运