检查列的 cumsum 是否大于范围值而不是增加列表中的元素

check if cumsum of the column is greater than range value than increment the element in list

我有一个列表

sample_dates = ["10/07/2021","11/07/2021","12/07/2021","13/07/2021",
                "14/07/2021","15/07/2021","16/07/2021","17/07/2021",
                "18/07/2021","19/07/2021","20/07/2021","21/07/2021",
                "22/07/2021","23/07/2021","24/07/2021"]

和如下所示的数据框

Truckid   Tripid   kms    
  1          1     700.3  
  1          1     608.9        
  1          1     400.2  
  1          2     100.2  
  1          2     140.8        
  1          3     1580.0 
  1          3     357.3        
  1          3     541.5  
  1          4     421.2   
  1          4     1694.4 
  1          4     1585.9 
  1          5     173.3  
  1          5     237.4   
  1          5     83.3   
  2          1     846.1  
  2          1     1167.6  
  2          2     388.8  
  2          2     70.5   
  2          2     127.1  
  2          3     126.7  
  2          3     262.4  

我想要 cumsum 的日期列,如果 kms > 0 & < 2000 应该有相同的日期,如果它增加 2000 而不是改变日期,如果它是 > 2000 & < 3000 则不改变并且比如果它通过 3000 而不是再次更改日期。等等

同样,如果 tripid 发生变化,则从 0 重新开始计数。

我想要这样的东西

Truckid   Tripid   kms        Date
  1          1     700.3      10/07/2021
  1          1     608.9      10/07/2021      
  1          1     400.2      10/07/2021
  1          2     100.2      11/07/2021
  1          2     140.8      11/07/2021      
  1          3     1580.0     12/07/2021
  1          3     357.3      12/07/2021      
  1          3     541.5      13/07/2021
  1          4     421.2      14/07/2021 
  1          4     1694.4     15/07/2021
  1          4     1585.9     16/07/2021
  1          5     173.3      17/07/2021
  1          5     237.4      17/07/2021 
  1          5     83.3       17/07/2021
  2          1     846.1      18/07/2021
  2          1     1167.6     19/07/2021 
  2          2     388.8      20/07/2021
  2          2     70.5       20/07/2021
  2          2     127.1      20/07/2021
  2          3     126.7      21/07/2021
  2          3     262.4      21/07/2021

您可以计算每组的累积总和,cut 可以手动进行,也可以使用数学技巧来进行分组。

那么map你的约会对象:

# round to thousands, clip to get min 1000 km
kms = df.groupby(['Truckid', 'Tripid'])['kms'].cumsum().floordiv(1000).clip(1)

# OR use manual bins
kms = pd.cut(df.groupby(['Truckid', 'Tripid'])['kms'].cumsum(),
             bins=[0,2000,3000,4000]) # etc. up to max wanted value


df['Date'] = (df
              .groupby(['Truckid', 'Tripid', kms]).ngroup() # get group ID
              .map(dict(enumerate(sample_dates)))      # match to items in order
             )

从起点开始使用连续天数的替代方法:

df['Date'] = pd.to_datetime(df.groupby(['Truckid', 'Tripid', kms]).ngroup(),
                            unit='d', origin='2021-07-10')

输出:

    Truckid  Tripid     kms        Date
0         1       1   700.3  10/07/2021
1         1       1   608.9  10/07/2021
2         1       1   400.2  10/07/2021
3         1       2   100.2  11/07/2021
4         1       2   140.8  11/07/2021
5         1       3  1580.0  12/07/2021
6         1       3   357.3  12/07/2021
7         1       3   541.5  13/07/2021
8         1       4   421.2  14/07/2021
9         1       4  1694.4  15/07/2021
10        1       4  1585.9  16/07/2021
11        1       5   173.3  17/07/2021
12        1       5   237.4  17/07/2021
13        1       5    83.3  17/07/2021
14        2       1   846.1  18/07/2021
15        2       1  1167.6  19/07/2021
16        2       2   388.8  20/07/2021
17        2       2    70.5  20/07/2021
18        2       2   127.1  20/07/2021
19        2       3   126.7  21/07/2021
20        2       3   262.4  21/07/2021