区分数据集中的后续阶段
Differentiating between consequential stages in a dataset
我正在尝试为正在关闭的机器的性能创建相应的阶段。这台机器在关机周期中必须经历不同的阶段。问题是机器可以在某些阶段按顺序返回。根据数据,您无法区分所有可能的阶段,因为有些显示相同的信息,但根据时间线可以确定机器在循环中的位置。
我创建了一个示例数据集来提供数据示例:
import pandas as pd
data = {
"Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
"Current": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"Flow": [39.8, 40.3, 40.2, 40.1, 40.3, 39.8, 40.1, 40.2, 40.4, 39.6, 40, 39.3, 40.7, 38.9, 39.3, 0, 0, 39.3, 39.2, 0, 0, 38.9, 38.7, 0, 39.3, 39.2, 40.3, 0, 0, 0]
}
df = pd.DataFrame(data)
我已经尝试使用以下代码区分阶段:
# Calculate the difference between two datapoints regarding the current change
df['Current_ddt'] = ((df["Current"]) - (df["Current"].shift(1)))
# Determine which part of the shutdown the machine is in based on current and flow data
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Running'
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] <= -1), 'progress in shutdown cycle'] = 'Ramping down'
df.loc[(df["Current"] > 4) & (df["Current"] < 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Ramp down complete between 4-4.5'
df.loc[(df["Current"] < 4.5) & (df["Current"] != 0) & (df["Current_ddt"] < -1), 'progress in shutdown cycle'] = 'Shutdown' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] == 0), 'progress in shutdown cycle'] = 'de-energized' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] != 0), 'progress in shutdown cycle'] = 'flushing' #Ideally this could distinguish first, second and third flush
这部分工作正常,直到断电。最终,我希望能够区分正常的斜坡下降(即降低生产水平)和斜坡下降至 4.5,因为我只对机器的真正关闭感兴趣,因为那是对机器造成最大损坏的时间方法不对也能做到。
然而,断电后的部分给我带来的问题最多。有 3 个冲洗周期,第一个是一般冲洗以清空机器。第二次和(可选)第三次冲洗是为了确保机器清洁并准备好进行维护。根据数据,虽然没有区别,所以我在想一个相应的方法来区分这些,但我不知道该怎么做。
想法输出将是这样的:
Date and Time
Current
Flow
Current_ddt
Progress in shutdown cycle
2020-06-07 00:00
16.2
39.8
2020-06-07 00:01
15.1
40.3
-1.1
Ramping down
2020-06-07 00:02
13.8
40.2
-1.3
Ramping down
2020-06-07 00:03
12
40.1
-1.8
Ramping down
2020-06-07 00:04
11.9
40.3
-0.0999999999999996
Running
2020-06-07 00:05
12.1
39.8
0.199999999999999
Running
2020-06-07 00:06
10.8
40.1
-1.3
Ramping down
2020-06-07 00:07
9.8
40.2
-1
Ramping down
2020-06-07 00:08
8.3
40.4
-1.5
Ramping down
2020-06-07 00:09
6.2
39.6
-2.1
Ramping down
2020-06-07 00:10
4.3
40
-1.9
Shutdown
2020-06-07 00:11
4.2
39.3
-0.0999999999999996
Ramp down complete between 4-4.5
2020-06-07 00:12
4.2
40.7
0
Ramp down complete between 4-4.5
2020-06-07 00:13
3.3
38.9
-0.9
Shutdown
2020-06-07 00:14
1.8
39.3
-1.5
Shutdown
2020-06-07 00:15
0
0
-1.8
de-energized
2020-06-07 00:16
0
0
0
de-energized
2020-06-07 00:17
0
39.3
0
purging
2020-06-07 00:18
0
39.2
0
purging
2020-06-07 00:19
0
0
0
purged
2020-06-07 00:20
0
0
0
purged
2020-06-07 00:21
0
38.9
0
second flush
2020-06-07 00:22
0
38.7
0
second flush
2020-06-07 00:23
0
0
0
flushed
2020-06-07 00:24
0
39.3
0
third flush
2020-06-07 00:25
0
39.2
0
third flush
2020-06-07 00:26
0
40.3
0
third flush
2020-06-07 00:27
0
0
0
flushed and stopped
2020-06-07 00:28
0
0
0
flushed and stopped
2020-06-07 00:29
0
0
0
flushed and stopped
有什么建议吗?
我已经实现了基于“当前”和“流量”列的简单状态机:
def state_machine():
current_state = None
current, flow = yield
while True:
c, flow = yield current_state
current_ddt = c - current
current = c
if current > 4.5:
if current_ddt <= -1:
current_state = "Ramping down"
else:
current_state = "Running"
elif current > 4:
if current_ddt < -1:
current_state = "Shutdown"
else:
current_state = "Ramp down complete between 4-4.5"
elif current > 0:
current_state = "Shutdown"
else:
states = iter(
[
"Purging",
"Purged",
"Second Flush",
"Flushed",
"Third Flush",
"Flushed and stopped",
]
)
# current is == 0, check the flow:
if flow == 0:
current_state = "De-energized"
waiting_for_zero = False
else:
current_state = next(states) # Purging
waiting_for_zero = True
while True:
current, flow = yield current_state
if flow > 0 and waiting_for_zero is False:
current_state = next(states)
waiting_for_zero = True
elif flow == 0 and waiting_for_zero is True:
current_state = next(states)
waiting_for_zero = False
if current_state == "Flushed and stopped":
# We are stopped completely, don't react to changes of "current" and/or "flow"
while True:
yield current_state
s = state_machine()
next(s)
df["Progress in shutdown cycle"] = df.apply(
lambda x: s.send((x["Current"], x["Flow"])), axis=1
)
print(df)
打印:
Date and Time Current Flow Progress in shutdown cycle
0 2020-06-07 00:00 16.2 39.8 None
1 2020-06-07 00:01 15.1 40.3 Ramping down
2 2020-06-07 00:02 13.8 40.2 Ramping down
3 2020-06-07 00:03 12.0 40.1 Ramping down
4 2020-06-07 00:04 11.9 40.3 Running
5 2020-06-07 00:05 12.1 39.8 Running
6 2020-06-07 00:06 10.8 40.1 Ramping down
7 2020-06-07 00:07 9.8 40.2 Ramping down
8 2020-06-07 00:08 8.3 40.4 Ramping down
9 2020-06-07 00:09 6.2 39.6 Ramping down
10 2020-06-07 00:10 4.3 40.0 Shutdown
11 2020-06-07 00:11 4.2 39.3 Ramp down complete between 4-4.5
12 2020-06-07 00:12 4.2 40.7 Ramp down complete between 4-4.5
13 2020-06-07 00:13 3.3 38.9 Shutdown
14 2020-06-07 00:14 1.8 39.3 Shutdown
15 2020-06-07 00:15 0.0 0.0 De-energized
16 2020-06-07 00:16 0.0 0.0 De-energized
17 2020-06-07 00:17 0.0 39.3 Purging
18 2020-06-07 00:18 0.0 39.2 Purging
19 2020-06-07 00:19 0.0 0.0 Purged
20 2020-06-07 00:20 0.0 0.0 Purged
21 2020-06-07 00:21 0.0 38.9 Second Flush
22 2020-06-07 00:22 0.0 38.7 Second Flush
23 2020-06-07 00:23 0.0 0.0 Flushed
24 2020-06-07 00:24 0.0 39.3 Third Flush
25 2020-06-07 00:25 0.0 39.2 Third Flush
26 2020-06-07 00:26 0.0 40.3 Third Flush
27 2020-06-07 00:27 0.0 0.0 Flushed and stopped
28 2020-06-07 00:28 0.0 0.0 Flushed and stopped
29 2020-06-07 00:29 0.0 0.0 Flushed and stopped
我正在尝试为正在关闭的机器的性能创建相应的阶段。这台机器在关机周期中必须经历不同的阶段。问题是机器可以在某些阶段按顺序返回。根据数据,您无法区分所有可能的阶段,因为有些显示相同的信息,但根据时间线可以确定机器在循环中的位置。
我创建了一个示例数据集来提供数据示例:
import pandas as pd
data = {
"Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
"Current": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
"Flow": [39.8, 40.3, 40.2, 40.1, 40.3, 39.8, 40.1, 40.2, 40.4, 39.6, 40, 39.3, 40.7, 38.9, 39.3, 0, 0, 39.3, 39.2, 0, 0, 38.9, 38.7, 0, 39.3, 39.2, 40.3, 0, 0, 0]
}
df = pd.DataFrame(data)
我已经尝试使用以下代码区分阶段:
# Calculate the difference between two datapoints regarding the current change
df['Current_ddt'] = ((df["Current"]) - (df["Current"].shift(1)))
# Determine which part of the shutdown the machine is in based on current and flow data
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Running'
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] <= -1), 'progress in shutdown cycle'] = 'Ramping down'
df.loc[(df["Current"] > 4) & (df["Current"] < 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Ramp down complete between 4-4.5'
df.loc[(df["Current"] < 4.5) & (df["Current"] != 0) & (df["Current_ddt"] < -1), 'progress in shutdown cycle'] = 'Shutdown' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] == 0), 'progress in shutdown cycle'] = 'de-energized' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] != 0), 'progress in shutdown cycle'] = 'flushing' #Ideally this could distinguish first, second and third flush
这部分工作正常,直到断电。最终,我希望能够区分正常的斜坡下降(即降低生产水平)和斜坡下降至 4.5,因为我只对机器的真正关闭感兴趣,因为那是对机器造成最大损坏的时间方法不对也能做到。
然而,断电后的部分给我带来的问题最多。有 3 个冲洗周期,第一个是一般冲洗以清空机器。第二次和(可选)第三次冲洗是为了确保机器清洁并准备好进行维护。根据数据,虽然没有区别,所以我在想一个相应的方法来区分这些,但我不知道该怎么做。
想法输出将是这样的:
Date and Time | Current | Flow | Current_ddt | Progress in shutdown cycle |
---|---|---|---|---|
2020-06-07 00:00 | 16.2 | 39.8 | ||
2020-06-07 00:01 | 15.1 | 40.3 | -1.1 | Ramping down |
2020-06-07 00:02 | 13.8 | 40.2 | -1.3 | Ramping down |
2020-06-07 00:03 | 12 | 40.1 | -1.8 | Ramping down |
2020-06-07 00:04 | 11.9 | 40.3 | -0.0999999999999996 | Running |
2020-06-07 00:05 | 12.1 | 39.8 | 0.199999999999999 | Running |
2020-06-07 00:06 | 10.8 | 40.1 | -1.3 | Ramping down |
2020-06-07 00:07 | 9.8 | 40.2 | -1 | Ramping down |
2020-06-07 00:08 | 8.3 | 40.4 | -1.5 | Ramping down |
2020-06-07 00:09 | 6.2 | 39.6 | -2.1 | Ramping down |
2020-06-07 00:10 | 4.3 | 40 | -1.9 | Shutdown |
2020-06-07 00:11 | 4.2 | 39.3 | -0.0999999999999996 | Ramp down complete between 4-4.5 |
2020-06-07 00:12 | 4.2 | 40.7 | 0 | Ramp down complete between 4-4.5 |
2020-06-07 00:13 | 3.3 | 38.9 | -0.9 | Shutdown |
2020-06-07 00:14 | 1.8 | 39.3 | -1.5 | Shutdown |
2020-06-07 00:15 | 0 | 0 | -1.8 | de-energized |
2020-06-07 00:16 | 0 | 0 | 0 | de-energized |
2020-06-07 00:17 | 0 | 39.3 | 0 | purging |
2020-06-07 00:18 | 0 | 39.2 | 0 | purging |
2020-06-07 00:19 | 0 | 0 | 0 | purged |
2020-06-07 00:20 | 0 | 0 | 0 | purged |
2020-06-07 00:21 | 0 | 38.9 | 0 | second flush |
2020-06-07 00:22 | 0 | 38.7 | 0 | second flush |
2020-06-07 00:23 | 0 | 0 | 0 | flushed |
2020-06-07 00:24 | 0 | 39.3 | 0 | third flush |
2020-06-07 00:25 | 0 | 39.2 | 0 | third flush |
2020-06-07 00:26 | 0 | 40.3 | 0 | third flush |
2020-06-07 00:27 | 0 | 0 | 0 | flushed and stopped |
2020-06-07 00:28 | 0 | 0 | 0 | flushed and stopped |
2020-06-07 00:29 | 0 | 0 | 0 | flushed and stopped |
有什么建议吗?
我已经实现了基于“当前”和“流量”列的简单状态机:
def state_machine():
current_state = None
current, flow = yield
while True:
c, flow = yield current_state
current_ddt = c - current
current = c
if current > 4.5:
if current_ddt <= -1:
current_state = "Ramping down"
else:
current_state = "Running"
elif current > 4:
if current_ddt < -1:
current_state = "Shutdown"
else:
current_state = "Ramp down complete between 4-4.5"
elif current > 0:
current_state = "Shutdown"
else:
states = iter(
[
"Purging",
"Purged",
"Second Flush",
"Flushed",
"Third Flush",
"Flushed and stopped",
]
)
# current is == 0, check the flow:
if flow == 0:
current_state = "De-energized"
waiting_for_zero = False
else:
current_state = next(states) # Purging
waiting_for_zero = True
while True:
current, flow = yield current_state
if flow > 0 and waiting_for_zero is False:
current_state = next(states)
waiting_for_zero = True
elif flow == 0 and waiting_for_zero is True:
current_state = next(states)
waiting_for_zero = False
if current_state == "Flushed and stopped":
# We are stopped completely, don't react to changes of "current" and/or "flow"
while True:
yield current_state
s = state_machine()
next(s)
df["Progress in shutdown cycle"] = df.apply(
lambda x: s.send((x["Current"], x["Flow"])), axis=1
)
print(df)
打印:
Date and Time Current Flow Progress in shutdown cycle
0 2020-06-07 00:00 16.2 39.8 None
1 2020-06-07 00:01 15.1 40.3 Ramping down
2 2020-06-07 00:02 13.8 40.2 Ramping down
3 2020-06-07 00:03 12.0 40.1 Ramping down
4 2020-06-07 00:04 11.9 40.3 Running
5 2020-06-07 00:05 12.1 39.8 Running
6 2020-06-07 00:06 10.8 40.1 Ramping down
7 2020-06-07 00:07 9.8 40.2 Ramping down
8 2020-06-07 00:08 8.3 40.4 Ramping down
9 2020-06-07 00:09 6.2 39.6 Ramping down
10 2020-06-07 00:10 4.3 40.0 Shutdown
11 2020-06-07 00:11 4.2 39.3 Ramp down complete between 4-4.5
12 2020-06-07 00:12 4.2 40.7 Ramp down complete between 4-4.5
13 2020-06-07 00:13 3.3 38.9 Shutdown
14 2020-06-07 00:14 1.8 39.3 Shutdown
15 2020-06-07 00:15 0.0 0.0 De-energized
16 2020-06-07 00:16 0.0 0.0 De-energized
17 2020-06-07 00:17 0.0 39.3 Purging
18 2020-06-07 00:18 0.0 39.2 Purging
19 2020-06-07 00:19 0.0 0.0 Purged
20 2020-06-07 00:20 0.0 0.0 Purged
21 2020-06-07 00:21 0.0 38.9 Second Flush
22 2020-06-07 00:22 0.0 38.7 Second Flush
23 2020-06-07 00:23 0.0 0.0 Flushed
24 2020-06-07 00:24 0.0 39.3 Third Flush
25 2020-06-07 00:25 0.0 39.2 Third Flush
26 2020-06-07 00:26 0.0 40.3 Third Flush
27 2020-06-07 00:27 0.0 0.0 Flushed and stopped
28 2020-06-07 00:28 0.0 0.0 Flushed and stopped
29 2020-06-07 00:29 0.0 0.0 Flushed and stopped