在 pandas(Python) 中减去按 id 分组的数据框中的连续行

Subtract successive rows in a dataframe grouped by id in pandas(Python)

我有以下数据框:

id        day           total_amount
 1       2015-07-09         1000
 1       2015-10-22          100
 1       2015-11-12          200
 1       2015-11-27         2392
 1       2015-12-16          123
 7       2015-07-09          200
 7       2015-07-09         1000
 7       2015-08-27       100018
 7       2015-11-25         1000
 8       2015-08-27         1000
 8       2015-12-07        10000
 8       2016-01-18          796
 8       2016-03-31        10000
15       2015-09-10         1500
15       2015-09-30         1000

如果它们具有相同的 ID,我需要在日列中每两个连续时间减去直到到达该 ID 的最后一行然后开始减去日列中的时间这次为新 ID,类似于输出中的以下行预计:

 1       2015-08-09         1000 2015-11-22 - 2015-08-09
 1       2015-11-22          100 2015-12-12 - 2015-11-22
 1       2015-12-12          200 2015-12-16 - 2015-12-12
 1       2015-12-16         2392 2015-12-27 - 2015-12-27
 1       2015-12-27          123         NA
 7       2015-08-09          200 2015-09-09 - 2015-08-09
 7       2015-09-09         1000 2015-09-27 - 2015-09-09
 7       2015-09-27       100018 2015-12-25 - 2015-09-27
 7       2015-12-25         1000         NA
 8       2015-08-27         1000  2015-12-07 - 2015-08-27
 8       2015-12-07        10000  2016-02-18 - 2015-12-07
 8       2016-02-18          796   2016-04-31- 2016-02-18     
 8       2016-04-31        10000         NA
15       2015-10-10         1500  2015-10-30 - 2015-10-10
15       2015-10-30         1000         NA

您可以使用 DataFrameGroupBy.diff:

df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
    id        day  total_amount      dif
0    1 2015-07-09          1000 105 days
1    1 2015-10-22           100  21 days
2    1 2015-11-12           200  15 days
3    1 2015-11-27          2392  19 days
4    1 2015-12-16           123      NaT
5    7 2015-07-09           200   0 days
6    7 2015-07-09          1000  49 days
7    7 2015-08-27        100018  90 days
8    7 2015-11-25          1000      NaT
9    8 2015-08-27          1000 102 days
10   8 2015-12-07         10000  42 days
11   8 2016-01-18           796  73 days
12   8 2016-03-31         10000      NaT
13  15 2015-09-10          1500  20 days
14  15 2015-09-30          1000      NaT

apply shift的另一个解决方案:

df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
    id        day  total_amount     diff
0    1 2015-07-09          1000 105 days
1    1 2015-10-22           100  21 days
2    1 2015-11-12           200  15 days
3    1 2015-11-27          2392  19 days
4    1 2015-12-16           123      NaT
5    7 2015-07-09           200   0 days
6    7 2015-07-09          1000  49 days
7    7 2015-08-27        100018  90 days
8    7 2015-11-25          1000      NaT
9    8 2015-08-27          1000 102 days
10   8 2015-12-07         10000  42 days
11   8 2016-01-18           796  73 days
12   8 2016-03-31         10000      NaT
13  15 2015-09-10          1500  20 days
14  15 2015-09-30          1000      NaT

通过评论编辑:

如果您需要 hoursint 的区别,请将 timedelta 转换为 hour

df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
    id        day  total_amount    diff
0    1 2015-07-09          1000  2520.0
1    1 2015-10-22           100   504.0
2    1 2015-11-12           200   360.0
3    1 2015-11-27          2392   456.0
4    1 2015-12-16           123     NaN
5    7 2015-07-09           200     0.0
6    7 2015-07-09          1000  1176.0
7    7 2015-08-27        100018  2160.0
8    7 2015-11-25          1000     NaN
9    8 2015-08-27          1000  2448.0
10   8 2015-12-07         10000  1008.0
11   8 2016-01-18           796  1752.0
12   8 2016-03-31         10000     NaN
13  15 2015-09-10          1500   480.0
14  15 2015-09-30          1000     NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) / 
                                     np.timedelta64(1, 'h')
print (df)
    id        day  total_amount    diff
0    1 2015-07-09          1000  2520.0
1    1 2015-10-22           100   504.0
2    1 2015-11-12           200   360.0
3    1 2015-11-27          2392   456.0
4    1 2015-12-16           123     NaN
5    7 2015-07-09           200     0.0
6    7 2015-07-09          1000  1176.0
7    7 2015-08-27        100018  2160.0
8    7 2015-11-25          1000     NaN
9    8 2015-08-27          1000  2448.0
10   8 2015-12-07         10000  1008.0
11   8 2016-01-18           796  1752.0
12   8 2016-03-31         10000     NaN
13  15 2015-09-10          1500   480.0
14  15 2015-09-30          1000     NaN