根据不同的列进行移位并根据列的变化开始移位
Shifting depending on different columns and begining the shift depending on changes in columns
我有一个数据框(这里是一个例子)
Date
UnitId
ServiceDomineId
Interval
ServiceTime
01/01/2021
1
1
8:00
30
01/01/2021
1
1
8:30
20
01/01/2021
1
1
9:00
10
01/01/2021
2
1
8:00
50
01/01/2021
2
1
9:00
10
01/01/2021
1
2
8:30
25
01/01/2021
1
2
9:00
15
01/01/2021
1
2
9:30
30
01/01/2021
2
2
8:00
45
01/01/2021
2
2
8:30
10
02/01/2021
1
1
8:00
30
02/01/2021
1
1
8:30
45
02/01/2021
1
1
9:00
10
02/01/2021
2
1
8:00
30
02/01/2021
2
1
8:30
55
02/01/2021
2
1
9:00
60
02/01/2021
1
2
8:00
35
02/01/2021
1
2
8:30
15
02/01/2021
1
2
9:00
10
02/01/2021
2
2
8:00
20
02/01/2021
2
2
8:30
35
02/01/2021
2
2
9:00
10
而且我需要有一个新列是上一个 Iterval 发生的事情(注意:我实际上将间隔保存为数字,每个间隔比前一个间隔多 1/48,例如 00:00 是 0,8:00 将是 16/48,那么它是 1/3 或者我有它 0.333333) 此外,如果我拥有的数据中不存在上一个间隔,我希望它显示该列中的 0。这是一个应该如何的例子。
Date
UnitId
ServiceDomineId
Interval
ServiceTime
ServiceTimePreviousInterval
01/01/2021
1
1
8:00
30
0
01/01/2021
1
1
8:30
20
30
01/01/2021
1
1
9:00
10
20
01/01/2021
2
1
8:00
50
0
01/01/2021
2
1
9:00
10
0
01/01/2021
1
2
8:30
25
0
01/01/2021
1
2
9:00
15
25
01/01/2021
1
2
9:30
30
15
01/01/2021
2
2
8:00
45
0
01/01/2021
2
2
8:30
10
45
02/01/2021
1
1
8:00
30
0
02/01/2021
1
1
8:30
45
30
02/01/2021
1
1
9:00
10
45
02/01/2021
2
1
8:00
30
0
02/01/2021
2
1
8:30
55
30
02/01/2021
2
1
9:00
60
55
02/01/2021
1
2
8:00
35
0
02/01/2021
1
2
8:30
15
35
02/01/2021
1
2
9:00
10
15
02/01/2021
2
2
8:00
20
0
02/01/2021
2
2
8:30
35
20
02/01/2021
2
2
9:00
10
35
我考虑过使用 Pandas 函数转换,但在缺少间隔时它无济于事。
还考虑过使用嵌套 for 来分隔不同的数据帧并使用 if 进行决策,但是由于完整的数据帧确实很大,因此花费的时间太长。
你知道我该怎么做吗?
这里是完整数据帧的图像,以及我如何看待我的间隔(浮动)与它们是什么的比较,以便更好地理解它。
我发现的另一种合乎逻辑的方法是
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
else:
df['ServiceTimePI'][i+1]=0
看起来您真正要检查的是 UnitId
和 ServiceDomineId
之前的 Interval
是否晚了 30 分钟。如果是这样,则将 ServiceTime
在该组中向下移动一行,否则将为零。
我们可以使用 .ne
和 cumsum
按连续 UnitId
分组并计算前一行和当前行之间的时间差(以分钟为单位),如果它等于 30,将先前的 ServiceTime
向上移动一位,否则用零填充。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021',
'01/01/2021','01/01/2021','01/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021',
'02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021'],
'UnitId': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2],
'ServiceDomineId': [1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2],
'Interval': ['8:00','8:30','9:00','8:00','9:00','8:30','9:00','9:30','8:00','8:30','8:00','8:30','9:00',
'8:00','8:30','9:00','8:00','8:30','9:00','8:00','8:30','9:00'],
'ServiceTime': [30,20,10,50,10,25,15,30,45,10,30,45,10,30,55,60,35,15,10,20,35,10]})
df.assign(ServiceTimePreviousInterval = np.where((df.groupby(df['UnitId'].ne(df['UnitId'].shift()).cumsum())
.apply(lambda x: (pd.to_datetime(x['Interval']) - pd.to_datetime(x['Interval']).shift()).dt.seconds / 60))
.reset_index()['Interval'] == 30,
df['ServiceTime'].shift(),
0
))
首先它按所需的列对数据帧进行排序,然后创建用 0 填充的新列,因为第一个值始终为 0,然后 for 仅检查列中是否有变化以及是否间隔是一个半小时,如果没有变化,间隔是半小时,那么我需要的值是上一个间隔的值,否则是0.
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
else:
df['ServiceTimePI'][i+1]=0
我有一个数据框(这里是一个例子)
Date | UnitId | ServiceDomineId | Interval | ServiceTime |
---|---|---|---|---|
01/01/2021 | 1 | 1 | 8:00 | 30 |
01/01/2021 | 1 | 1 | 8:30 | 20 |
01/01/2021 | 1 | 1 | 9:00 | 10 |
01/01/2021 | 2 | 1 | 8:00 | 50 |
01/01/2021 | 2 | 1 | 9:00 | 10 |
01/01/2021 | 1 | 2 | 8:30 | 25 |
01/01/2021 | 1 | 2 | 9:00 | 15 |
01/01/2021 | 1 | 2 | 9:30 | 30 |
01/01/2021 | 2 | 2 | 8:00 | 45 |
01/01/2021 | 2 | 2 | 8:30 | 10 |
02/01/2021 | 1 | 1 | 8:00 | 30 |
02/01/2021 | 1 | 1 | 8:30 | 45 |
02/01/2021 | 1 | 1 | 9:00 | 10 |
02/01/2021 | 2 | 1 | 8:00 | 30 |
02/01/2021 | 2 | 1 | 8:30 | 55 |
02/01/2021 | 2 | 1 | 9:00 | 60 |
02/01/2021 | 1 | 2 | 8:00 | 35 |
02/01/2021 | 1 | 2 | 8:30 | 15 |
02/01/2021 | 1 | 2 | 9:00 | 10 |
02/01/2021 | 2 | 2 | 8:00 | 20 |
02/01/2021 | 2 | 2 | 8:30 | 35 |
02/01/2021 | 2 | 2 | 9:00 | 10 |
而且我需要有一个新列是上一个 Iterval 发生的事情(注意:我实际上将间隔保存为数字,每个间隔比前一个间隔多 1/48,例如 00:00 是 0,8:00 将是 16/48,那么它是 1/3 或者我有它 0.333333) 此外,如果我拥有的数据中不存在上一个间隔,我希望它显示该列中的 0。这是一个应该如何的例子。
Date | UnitId | ServiceDomineId | Interval | ServiceTime | ServiceTimePreviousInterval |
---|---|---|---|---|---|
01/01/2021 | 1 | 1 | 8:00 | 30 | 0 |
01/01/2021 | 1 | 1 | 8:30 | 20 | 30 |
01/01/2021 | 1 | 1 | 9:00 | 10 | 20 |
01/01/2021 | 2 | 1 | 8:00 | 50 | 0 |
01/01/2021 | 2 | 1 | 9:00 | 10 | 0 |
01/01/2021 | 1 | 2 | 8:30 | 25 | 0 |
01/01/2021 | 1 | 2 | 9:00 | 15 | 25 |
01/01/2021 | 1 | 2 | 9:30 | 30 | 15 |
01/01/2021 | 2 | 2 | 8:00 | 45 | 0 |
01/01/2021 | 2 | 2 | 8:30 | 10 | 45 |
02/01/2021 | 1 | 1 | 8:00 | 30 | 0 |
02/01/2021 | 1 | 1 | 8:30 | 45 | 30 |
02/01/2021 | 1 | 1 | 9:00 | 10 | 45 |
02/01/2021 | 2 | 1 | 8:00 | 30 | 0 |
02/01/2021 | 2 | 1 | 8:30 | 55 | 30 |
02/01/2021 | 2 | 1 | 9:00 | 60 | 55 |
02/01/2021 | 1 | 2 | 8:00 | 35 | 0 |
02/01/2021 | 1 | 2 | 8:30 | 15 | 35 |
02/01/2021 | 1 | 2 | 9:00 | 10 | 15 |
02/01/2021 | 2 | 2 | 8:00 | 20 | 0 |
02/01/2021 | 2 | 2 | 8:30 | 35 | 20 |
02/01/2021 | 2 | 2 | 9:00 | 10 | 35 |
我考虑过使用 Pandas 函数转换,但在缺少间隔时它无济于事。 还考虑过使用嵌套 for 来分隔不同的数据帧并使用 if 进行决策,但是由于完整的数据帧确实很大,因此花费的时间太长。 你知道我该怎么做吗?
这里是完整数据帧的图像,以及我如何看待我的间隔(浮动)与它们是什么的比较,以便更好地理解它。
我发现的另一种合乎逻辑的方法是
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
else:
df['ServiceTimePI'][i+1]=0
看起来您真正要检查的是 UnitId
和 ServiceDomineId
之前的 Interval
是否晚了 30 分钟。如果是这样,则将 ServiceTime
在该组中向下移动一行,否则将为零。
我们可以使用 .ne
和 cumsum
按连续 UnitId
分组并计算前一行和当前行之间的时间差(以分钟为单位),如果它等于 30,将先前的 ServiceTime
向上移动一位,否则用零填充。
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': ['01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021','01/01/2021',
'01/01/2021','01/01/2021','01/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021',
'02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021','02/01/2021'],
'UnitId': [1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2],
'ServiceDomineId': [1,1,1,1,1,2,2,2,2,2,1,1,1,1,1,1,2,2,2,2,2,2],
'Interval': ['8:00','8:30','9:00','8:00','9:00','8:30','9:00','9:30','8:00','8:30','8:00','8:30','9:00',
'8:00','8:30','9:00','8:00','8:30','9:00','8:00','8:30','9:00'],
'ServiceTime': [30,20,10,50,10,25,15,30,45,10,30,45,10,30,55,60,35,15,10,20,35,10]})
df.assign(ServiceTimePreviousInterval = np.where((df.groupby(df['UnitId'].ne(df['UnitId'].shift()).cumsum())
.apply(lambda x: (pd.to_datetime(x['Interval']) - pd.to_datetime(x['Interval']).shift()).dt.seconds / 60))
.reset_index()['Interval'] == 30,
df['ServiceTime'].shift(),
0
))
首先它按所需的列对数据帧进行排序,然后创建用 0 填充的新列,因为第一个值始终为 0,然后 for 仅检查列中是否有变化以及是否间隔是一个半小时,如果没有变化,间隔是半小时,那么我需要的值是上一个间隔的值,否则是0.
df=df.sort_values(['Date','UnitId','ServiceProfileId','Intervals'])
df['ServiceTimePI']=np.zeros(len(df))
df['ServiceTimePI'][0]=0
for i in range(len(df)-1):
if (df['Date'][i]==df['Date'][i+1]) and (df['UnitId'][i]==df['UnitId'][i+1]) and (df['ServiceProfileId'][i]==df['ServiceProfileId'][i+1]) and (df['Intervals'][i+1]-df['Intervals'][i]<=1/48+0.00000001) and (df['Intervals'][i+1]-df['Intervals'][i]>=1/48-0.00000001):
df['ServiceTimePI'][i+1]=df['Promedio_ServiceTime'][i]
else:
df['ServiceTimePI'][i+1]=0