根据不同的列进行移位并根据列的变化开始移位

Question

我有一个数据框（这里是一个例子）

Date	UnitId	ServiceDomineId	Interval	ServiceTime
01/01/2021	1	1	8:00	30
01/01/2021	1	1	8:30	20
01/01/2021	1	1	9:00	10
01/01/2021	2	1	8:00	50
01/01/2021	2	1	9:00	10
01/01/2021	1	2	8:30	25
01/01/2021	1	2	9:00	15
01/01/2021	1	2	9:30	30
01/01/2021	2	2	8:00	45
01/01/2021	2	2	8:30	10
02/01/2021	1	1	8:00	30
02/01/2021	1	1	8:30	45
02/01/2021	1	1	9:00	10
02/01/2021	2	1	8:00	30
02/01/2021	2	1	8:30	55
02/01/2021	2	1	9:00	60
02/01/2021	1	2	8:00	35
02/01/2021	1	2	8:30	15
02/01/2021	1	2	9:00	10
02/01/2021	2	2	8:00	20
02/01/2021	2	2	8:30	35
02/01/2021	2	2	9:00	10

而且我需要有一个新列是上一个 Iterval 发生的事情（注意：我实际上将间隔保存为数字，每个间隔比前一个间隔多 1/48，例如 00:00 是 0，8:00 将是 16/48，那么它是 1/3 或者我有它 0.333333) 此外，如果我拥有的数据中不存在上一个间隔，我希望它显示该列中的 0。这是一个应该如何的例子。

Date	UnitId	ServiceDomineId	Interval	ServiceTime	ServiceTimePreviousInterval
01/01/2021	1	1	8:00	30	0
01/01/2021	1	1	8:30	20	30
01/01/2021	1	1	9:00	10	20
01/01/2021	2	1	8:00	50	0
01/01/2021	2	1	9:00	10	0
01/01/2021	1	2	8:30	25	0
01/01/2021	1	2	9:00	15	25
01/01/2021	1	2	9:30	30	15
01/01/2021	2	2	8:00	45	0
01/01/2021	2	2	8:30	10	45
02/01/2021	1	1	8:00	30	0
02/01/2021	1	1	8:30	45	30
02/01/2021	1	1	9:00	10	45
02/01/2021	2	1	8:00	30	0
02/01/2021	2	1	8:30	55	30
02/01/2021	2	1	9:00	60	55
02/01/2021	1	2	8:00	35	0
02/01/2021	1	2	8:30	15	35
02/01/2021	1	2	9:00	10	15
02/01/2021	2	2	8:00	20	0
02/01/2021	2	2	8:30	35	20
02/01/2021	2	2	9:00	10	35

我考虑过使用 Pandas 函数转换，但在缺少间隔时它无济于事。还考虑过使用嵌套 for 来分隔不同的数据帧并使用 if 进行决策，但是由于完整的数据帧确实很大，因此花费的时间太长。你知道我该怎么做吗？

这里是完整数据帧的图像，以及我如何看待我的间隔（浮动）与它们是什么的比较，以便更好地理解它。

我发现的另一种合乎逻辑的方法是

df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
  if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
    df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
  else:
    df['ServiceTimePI'][i+1]=0

Answer 1

看起来您真正要检查的是 UnitId 和 ServiceDomineId 之前的 Interval 是否晚了 30 分钟。如果是这样，则将 ServiceTime 在该组中向下移动一行，否则将为零。

我们可以使用 .ne 和 cumsum 按连续 UnitId 分组并计算前一行和当前行之间的时间差（以分钟为单位），如果它等于 30，将先前的 ServiceTime 向上移动一位，否则用零填充。

import pandas as pd
import numpy as np

df = pd.DataFrame({'Date': ['01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021',
  '01/01/2021','01/01/2021','01/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021',
  '02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021'],
 'UnitId': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2],
 'ServiceDomineId': [1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2],
 'Interval': ['8:00','8:30','9:00','8:00','9:00','8:30','9:00','9:30','8:00','8:30','8:00','8:30','9:00',
  '8:00','8:30','9:00','8:00','8:30','9:00','8:00','8:30','9:00'],
 'ServiceTime': [30,20,10,50,10,25,15,30,45,10,30,45,10,30,55,60,35,15,10,20,35,10]})

df.assign(ServiceTimePreviousInterval = np.where((df.groupby(df['UnitId'].ne(df['UnitId'].shift()).cumsum())
                                                    .apply(lambda x: (pd.to_datetime(x['Interval']) - pd.to_datetime(x['Interval']).shift()).dt.seconds / 60))
                                                    .reset_index()['Interval'] == 30,
                                                  df['ServiceTime'].shift(),
                                                  0
                                                 ))

Answer 2

首先它按所需的列对数据帧进行排序，然后创建用 0 填充的新列，因为第一个值始终为 0，然后 for 仅检查列中是否有变化以及是否间隔是一个半小时，如果没有变化，间隔是半小时，那么我需要的值是上一个间隔的值，否则是0.

df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
    if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
        df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
    else:
        df['ServiceTimePI'][i+1]=0

根据不同的列进行移位并根据列的变化开始移位

Shifting depending on different columns and begining the shift depending on changes in columns

python

shift

multiple-columns

pandas