在 pandas(Python) 中减去按 id 分组的数据框中的连续行
Subtract successive rows in a dataframe grouped by id in pandas(Python)
我有以下数据框:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
7 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
如果它们具有相同的 ID,我需要在日列中每两个连续时间减去直到到达该 ID 的最后一行然后开始减去日列中的时间这次为新 ID,类似于输出中的以下行预计:
1 2015-08-09 1000 2015-11-22 - 2015-08-09
1 2015-11-22 100 2015-12-12 - 2015-11-22
1 2015-12-12 200 2015-12-16 - 2015-12-12
1 2015-12-16 2392 2015-12-27 - 2015-12-27
1 2015-12-27 123 NA
7 2015-08-09 200 2015-09-09 - 2015-08-09
7 2015-09-09 1000 2015-09-27 - 2015-09-09
7 2015-09-27 100018 2015-12-25 - 2015-09-27
7 2015-12-25 1000 NA
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-02-18 - 2015-12-07
8 2016-02-18 796 2016-04-31- 2016-02-18
8 2016-04-31 10000 NA
15 2015-10-10 1500 2015-10-30 - 2015-10-10
15 2015-10-30 1000 NA
您可以使用 DataFrameGroupBy.diff
:
df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
id day total_amount dif
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
id day total_amount diff
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
通过评论编辑:
如果您需要 hours
与 int
的区别,请将 timedelta
转换为 hour
:
df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) /
np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
我有以下数据框:
id day total_amount
1 2015-07-09 1000
1 2015-10-22 100
1 2015-11-12 200
1 2015-11-27 2392
1 2015-12-16 123
7 2015-07-09 200
7 2015-07-09 1000
7 2015-08-27 100018
7 2015-11-25 1000
8 2015-08-27 1000
8 2015-12-07 10000
8 2016-01-18 796
8 2016-03-31 10000
15 2015-09-10 1500
15 2015-09-30 1000
如果它们具有相同的 ID,我需要在日列中每两个连续时间减去直到到达该 ID 的最后一行然后开始减去日列中的时间这次为新 ID,类似于输出中的以下行预计:
1 2015-08-09 1000 2015-11-22 - 2015-08-09
1 2015-11-22 100 2015-12-12 - 2015-11-22
1 2015-12-12 200 2015-12-16 - 2015-12-12
1 2015-12-16 2392 2015-12-27 - 2015-12-27
1 2015-12-27 123 NA
7 2015-08-09 200 2015-09-09 - 2015-08-09
7 2015-09-09 1000 2015-09-27 - 2015-09-09
7 2015-09-27 100018 2015-12-25 - 2015-09-27
7 2015-12-25 1000 NA
8 2015-08-27 1000 2015-12-07 - 2015-08-27
8 2015-12-07 10000 2016-02-18 - 2015-12-07
8 2016-02-18 796 2016-04-31- 2016-02-18
8 2016-04-31 10000 NA
15 2015-10-10 1500 2015-10-30 - 2015-10-10
15 2015-10-30 1000 NA
您可以使用 DataFrameGroupBy.diff
:
df['dif'] = df.groupby('id')['day'].diff(-1) * (-1)
print (df)
id day total_amount dif
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x)
print (df)
id day total_amount diff
0 1 2015-07-09 1000 105 days
1 1 2015-10-22 100 21 days
2 1 2015-11-12 200 15 days
3 1 2015-11-27 2392 19 days
4 1 2015-12-16 123 NaT
5 7 2015-07-09 200 0 days
6 7 2015-07-09 1000 49 days
7 7 2015-08-27 100018 90 days
8 7 2015-11-25 1000 NaT
9 8 2015-08-27 1000 102 days
10 8 2015-12-07 10000 42 days
11 8 2016-01-18 796 73 days
12 8 2016-03-31 10000 NaT
13 15 2015-09-10 1500 20 days
14 15 2015-09-30 1000 NaT
通过评论编辑:
如果您需要 hours
与 int
的区别,请将 timedelta
转换为 hour
:
df['diff'] = df.groupby('id')['day'].diff(-1) * (-1) / np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN
df['diff'] = df.groupby('id')['day'].apply(lambda x: x.shift(-1) - x) /
np.timedelta64(1, 'h')
print (df)
id day total_amount diff
0 1 2015-07-09 1000 2520.0
1 1 2015-10-22 100 504.0
2 1 2015-11-12 200 360.0
3 1 2015-11-27 2392 456.0
4 1 2015-12-16 123 NaN
5 7 2015-07-09 200 0.0
6 7 2015-07-09 1000 1176.0
7 7 2015-08-27 100018 2160.0
8 7 2015-11-25 1000 NaN
9 8 2015-08-27 1000 2448.0
10 8 2015-12-07 10000 1008.0
11 8 2016-01-18 796 1752.0
12 8 2016-03-31 10000 NaN
13 15 2015-09-10 1500 480.0
14 15 2015-09-30 1000 NaN