根据随时间匹配的观察结果计算差异

Compute difference based on matching observations over time

假设我们有以下数据框:

    Date    Type    Country Value
0   2016-04-30  A   NL       1
1   2016-04-30  A   BE       2
2   2016-04-30  B   NL       3
3   2016-04-30  B   BE       4
4   2016-04-30  C   NL       5
5   2016-04-30  C   BE       6
6   2016-04-30  C   FR       7
7   2016-04-30  C   UK       8
8   2016-05-31  A   NL       9
9   2016-05-31  A   BE       10
10  2016-05-31  A   FR       11
11  2016-05-31  B   NL       12
12  2016-05-31  B   BE       13
13  2016-05-31  B   FR       14
14  2016-05-31  C   NL       15
15  2016-05-31  C   BE       16
16  2016-05-31  C   UK       17
17  2016-05-31  C   SL       18
18  2016-06-30  A   NL       19
19  2016-06-30  B   FR       20
20  2016-06-30  B   UK       21
21  2016-06-30  B   SL       22
22  2016-06-30  C   NL       23
23  2016-06-30  C   BE       24

可以用下面的代码计算:

df = pd.DataFrame([['2016-04-30','A','NL',1], ['2016-04-30','A', "BE" ,2], ['2016-04-30', 'B',  'NL',3], ['2016-04-30','B','BE',4], ['2016-04-30','C','NL',5], ['2016-04-30','C','BE',6],['2016-04-30','C','FR', 7], ['2016-04-30','C','UK',8], ['2016-05-31','A','NL',9], ['2016-05-31','A','BE',10], ['2016-05-31','A','FR',11], ['2016-05-31','B','NL',12], ['2016-05-31','B','BE',13], ['2016-05-31','B','FR',14], ['2016-05-31','C','NL',15], ['2016-05-31','C','BE',16], ['2016-05-31','C','UK',17], ['2016-05-31','C','SL',18], ['2016-06-30','A','NL',19], ['2016-06-30','B','FR',20], ['2016-06-30','B','UK',21], ['2016-06-30','B','SL',22], ['2016-06-30','C','NL',23], ['2016-06-30','C','BE',24]], columns=['Date','Type' ,'Country' ,'Value'])

我想添加一个额外的列 'ValueDiff',它主要计算与前一时期的观察值相比的差异。因此,例如对于观察 'Date: 2016-05-31, Type: B, Country: BE',我想将 'ValueDiff' 设置为 13-4 = 9。如果观察在前一时期不可用,我想将其设置为 NaN .

预期 df:

    Date    Type    Country Value  ValueDiff
0   2016-04-30  A   NL       1       nan
1   2016-04-30  A   BE       2       nan
2   2016-04-30  B   NL       3       nan
3   2016-04-30  B   BE       4       nan
4   2016-04-30  C   NL       5       nan
5   2016-04-30  C   BE       6       nan
6   2016-04-30  C   FR       7       nan  
7   2016-04-30  C   UK       8       nan
8   2016-05-31  A   NL       9        8
9   2016-05-31  A   BE       10       8
10  2016-05-31  A   FR       11       nan
11  2016-05-31  B   NL       12       9 
12  2016-05-31  B   BE       13       9
13  2016-05-31  B   FR       14       nan 
14  2016-05-31  C   NL       15       10 
15  2016-05-31  C   BE       16       10
16  2016-05-31  C   UK       17       9 
17  2016-05-31  C   SL       18       nan 
18  2016-06-30  A   NL       19       10
19  2016-06-30  B   FR       20       6 
20  2016-06-30  B   UK       21       nan
21  2016-06-30  B   SL       22       nan 
22  2016-06-30  C   NL       23       8 
23  2016-06-30  C   BE       24       8

有没有有效的方法来做到这一点?

如果每个 Date 组都有所有唯一的 TypeCountry 对,则可以使用 DataFrameGroupBy.diff:

df['ValueDiff'] = df.groupby(['Type','Country'])['Value'].diff()
print (df)
          Date Type Country  Value  ValueDiff
0   2016-04-30    A      NL      1        NaN
1   2016-04-30    A      BE      2        NaN
2   2016-04-30    B      NL      3        NaN
3   2016-04-30    B      BE      4        NaN
4   2016-04-30    C      NL      5        NaN
5   2016-04-30    C      BE      6        NaN
6   2016-04-30    C      FR      7        NaN
7   2016-04-30    C      UK      8        NaN
8   2016-05-31    A      NL      9        8.0
9   2016-05-31    A      BE     10        8.0
10  2016-05-31    A      FR     11        NaN
11  2016-05-31    B      NL     12        9.0
12  2016-05-31    B      BE     13        9.0
13  2016-05-31    B      FR     14        NaN
14  2016-05-31    C      NL     15       10.0
15  2016-05-31    C      BE     16       10.0
16  2016-05-31    C      UK     17        9.0
17  2016-05-31    C      SL     18        NaN
18  2016-06-30    A      NL     19       10.0
19  2016-06-30    B      FR     20        6.0
20  2016-06-30    B      UK     21        NaN
21  2016-06-30    B      SL     22        NaN
22  2016-06-30    C      NL     23        8.0
23  2016-06-30    C      BE     24        8.0