检查列的 cumsum 是否大于范围值而不是增加列表中的元素
check if cumsum of the column is greater than range value than increment the element in list
我有一个列表
sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
"14/07/2021","15/07/2021","16/07/2021","17/07/2021",
"18/07/2021","19/07/2021","20/07/2021","21/07/2021",
"22/07/2021","23/07/2021","24/07/2021"]
和如下所示的数据框
Truckid Tripid kms
1 1 700.3
1 1 608.9
1 1 400.2
1 2 100.2
1 2 140.8
1 3 1580.0
1 3 357.3
1 3 541.5
1 4 421.2
1 4 1694.4
1 4 1585.9
1 5 173.3
1 5 237.4
1 5 83.3
2 1 846.1
2 1 1167.6
2 2 388.8
2 2 70.5
2 2 127.1
2 3 126.7
2 3 262.4
我想要 cumsum 的日期列,如果 kms > 0 & < 2000 应该有相同的日期,如果它增加 2000 而不是改变日期,如果它是 > 2000 & < 3000 则不改变并且比如果它通过 3000 而不是再次更改日期。等等
同样,如果 tripid 发生变化,则从 0 重新开始计数。
我想要这样的东西
Truckid Tripid kms Date
1 1 700.3 10/07/2021
1 1 608.9 10/07/2021
1 1 400.2 10/07/2021
1 2 100.2 11/07/2021
1 2 140.8 11/07/2021
1 3 1580.0 12/07/2021
1 3 357.3 12/07/2021
1 3 541.5 13/07/2021
1 4 421.2 14/07/2021
1 4 1694.4 15/07/2021
1 4 1585.9 16/07/2021
1 5 173.3 17/07/2021
1 5 237.4 17/07/2021
1 5 83.3 17/07/2021
2 1 846.1 18/07/2021
2 1 1167.6 19/07/2021
2 2 388.8 20/07/2021
2 2 70.5 20/07/2021
2 2 127.1 20/07/2021
2 3 126.7 21/07/2021
2 3 262.4 21/07/2021
您可以计算每组的累积总和,cut
可以手动进行,也可以使用数学技巧来进行分组。
那么map
你的约会对象:
# round to thousands, clip to get min 1000 km
kms = df.groupby(['Truckid', 'Tripid'])['kms'].cumsum().floordiv(1000).clip(1)
# OR use manual bins
kms = pd.cut(df.groupby(['Truckid', 'Tripid'])['kms'].cumsum(),
bins=[0,2000,3000,4000]) # etc. up to max wanted value
df['Date'] = (df
.groupby(['Truckid', 'Tripid', kms]).ngroup() # get group ID
.map(dict(enumerate(sample_dates))) # match to items in order
)
从起点开始使用连续天数的替代方法:
df['Date'] = pd.to_datetime(df.groupby(['Truckid', 'Tripid', kms]).ngroup(),
unit='d', origin='2021-07-10')
输出:
Truckid Tripid kms Date
0 1 1 700.3 10/07/2021
1 1 1 608.9 10/07/2021
2 1 1 400.2 10/07/2021
3 1 2 100.2 11/07/2021
4 1 2 140.8 11/07/2021
5 1 3 1580.0 12/07/2021
6 1 3 357.3 12/07/2021
7 1 3 541.5 13/07/2021
8 1 4 421.2 14/07/2021
9 1 4 1694.4 15/07/2021
10 1 4 1585.9 16/07/2021
11 1 5 173.3 17/07/2021
12 1 5 237.4 17/07/2021
13 1 5 83.3 17/07/2021
14 2 1 846.1 18/07/2021
15 2 1 1167.6 19/07/2021
16 2 2 388.8 20/07/2021
17 2 2 70.5 20/07/2021
18 2 2 127.1 20/07/2021
19 2 3 126.7 21/07/2021
20 2 3 262.4 21/07/2021
我有一个列表
sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
"14/07/2021","15/07/2021","16/07/2021","17/07/2021",
"18/07/2021","19/07/2021","20/07/2021","21/07/2021",
"22/07/2021","23/07/2021","24/07/2021"]
和如下所示的数据框
Truckid Tripid kms
1 1 700.3
1 1 608.9
1 1 400.2
1 2 100.2
1 2 140.8
1 3 1580.0
1 3 357.3
1 3 541.5
1 4 421.2
1 4 1694.4
1 4 1585.9
1 5 173.3
1 5 237.4
1 5 83.3
2 1 846.1
2 1 1167.6
2 2 388.8
2 2 70.5
2 2 127.1
2 3 126.7
2 3 262.4
我想要 cumsum 的日期列,如果 kms > 0 & < 2000 应该有相同的日期,如果它增加 2000 而不是改变日期,如果它是 > 2000 & < 3000 则不改变并且比如果它通过 3000 而不是再次更改日期。等等
同样,如果 tripid 发生变化,则从 0 重新开始计数。
我想要这样的东西
Truckid Tripid kms Date
1 1 700.3 10/07/2021
1 1 608.9 10/07/2021
1 1 400.2 10/07/2021
1 2 100.2 11/07/2021
1 2 140.8 11/07/2021
1 3 1580.0 12/07/2021
1 3 357.3 12/07/2021
1 3 541.5 13/07/2021
1 4 421.2 14/07/2021
1 4 1694.4 15/07/2021
1 4 1585.9 16/07/2021
1 5 173.3 17/07/2021
1 5 237.4 17/07/2021
1 5 83.3 17/07/2021
2 1 846.1 18/07/2021
2 1 1167.6 19/07/2021
2 2 388.8 20/07/2021
2 2 70.5 20/07/2021
2 2 127.1 20/07/2021
2 3 126.7 21/07/2021
2 3 262.4 21/07/2021
您可以计算每组的累积总和,cut
可以手动进行,也可以使用数学技巧来进行分组。
那么map
你的约会对象:
# round to thousands, clip to get min 1000 km
kms = df.groupby(['Truckid', 'Tripid'])['kms'].cumsum().floordiv(1000).clip(1)
# OR use manual bins
kms = pd.cut(df.groupby(['Truckid', 'Tripid'])['kms'].cumsum(),
bins=[0,2000,3000,4000]) # etc. up to max wanted value
df['Date'] = (df
.groupby(['Truckid', 'Tripid', kms]).ngroup() # get group ID
.map(dict(enumerate(sample_dates))) # match to items in order
)
从起点开始使用连续天数的替代方法:
df['Date'] = pd.to_datetime(df.groupby(['Truckid', 'Tripid', kms]).ngroup(),
unit='d', origin='2021-07-10')
输出:
Truckid Tripid kms Date
0 1 1 700.3 10/07/2021
1 1 1 608.9 10/07/2021
2 1 1 400.2 10/07/2021
3 1 2 100.2 11/07/2021
4 1 2 140.8 11/07/2021
5 1 3 1580.0 12/07/2021
6 1 3 357.3 12/07/2021
7 1 3 541.5 13/07/2021
8 1 4 421.2 14/07/2021
9 1 4 1694.4 15/07/2021
10 1 4 1585.9 16/07/2021
11 1 5 173.3 17/07/2021
12 1 5 237.4 17/07/2021
13 1 5 83.3 17/07/2021
14 2 1 846.1 18/07/2021
15 2 1 1167.6 19/07/2021
16 2 2 388.8 20/07/2021
17 2 2 70.5 20/07/2021
18 2 2 127.1 20/07/2021
19 2 3 126.7 21/07/2021
20 2 3 262.4 21/07/2021