区分数据集中的后续阶段

Differentiating between consequential stages in a dataset

我正在尝试为正在关闭的机器的性能创建相应的阶段。这台机器在关机周期中必须经历不同的阶段。问题是机器可以在某些阶段按顺序返回。根据数据,您无法区分所有可能的阶段,因为有些显示相同的信息,但根据时间线可以确定机器在循环中的位置。

我创建了一个示例数据集来提供数据示例:

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
  "Current": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
  "Flow": [39.8, 40.3, 40.2, 40.1, 40.3, 39.8, 40.1, 40.2, 40.4, 39.6, 40, 39.3, 40.7, 38.9, 39.3, 0, 0, 39.3, 39.2, 0, 0, 38.9, 38.7, 0, 39.3, 39.2, 40.3, 0, 0, 0]
}

df = pd.DataFrame(data)

我已经尝试使用以下代码区分阶段:

# Calculate the difference between two datapoints regarding the current change
df['Current_ddt'] = ((df["Current"]) - (df["Current"].shift(1)))

# Determine which part of the shutdown the machine is in based on current and flow data
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Running' 
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] <= -1), 'progress in shutdown cycle'] = 'Ramping down'
df.loc[(df["Current"] > 4) & (df["Current"] < 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Ramp down complete between 4-4.5'
df.loc[(df["Current"] < 4.5) & (df["Current"] != 0) & (df["Current_ddt"] < -1), 'progress in shutdown cycle'] = 'Shutdown' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] == 0), 'progress in shutdown cycle'] = 'de-energized' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] != 0), 'progress in shutdown cycle'] = 'flushing' #Ideally this could distinguish first, second and third flush

这部分工作正常,直到断电。最终,我希望能够区分正常的斜坡下降(即降低生产水平)和斜坡下降至 4.5,因为我只对机器的真正关闭感兴趣,因为那是对机器造成最大损坏的时间方法不对也能做到。

然而,断电后的部分给我带来的问题最多。有 3 个冲洗周期,第一个是一般冲洗以清空机器。第二次和(可选)第三次冲洗是为了确保机器清洁并准备好进行维护。根据数据,虽然没有区别,所以我在想一个相应的方法来区分这些,但我不知道该怎么做。

想法输出将是这样的:

Date and Time Current Flow Current_ddt Progress in shutdown cycle
2020-06-07 00:00 16.2 39.8
2020-06-07 00:01 15.1 40.3 -1.1 Ramping down
2020-06-07 00:02 13.8 40.2 -1.3 Ramping down
2020-06-07 00:03 12 40.1 -1.8 Ramping down
2020-06-07 00:04 11.9 40.3 -0.0999999999999996 Running
2020-06-07 00:05 12.1 39.8 0.199999999999999 Running
2020-06-07 00:06 10.8 40.1 -1.3 Ramping down
2020-06-07 00:07 9.8 40.2 -1 Ramping down
2020-06-07 00:08 8.3 40.4 -1.5 Ramping down
2020-06-07 00:09 6.2 39.6 -2.1 Ramping down
2020-06-07 00:10 4.3 40 -1.9 Shutdown
2020-06-07 00:11 4.2 39.3 -0.0999999999999996 Ramp down complete between 4-4.5
2020-06-07 00:12 4.2 40.7 0 Ramp down complete between 4-4.5
2020-06-07 00:13 3.3 38.9 -0.9 Shutdown
2020-06-07 00:14 1.8 39.3 -1.5 Shutdown
2020-06-07 00:15 0 0 -1.8 de-energized
2020-06-07 00:16 0 0 0 de-energized
2020-06-07 00:17 0 39.3 0 purging
2020-06-07 00:18 0 39.2 0 purging
2020-06-07 00:19 0 0 0 purged
2020-06-07 00:20 0 0 0 purged
2020-06-07 00:21 0 38.9 0 second flush
2020-06-07 00:22 0 38.7 0 second flush
2020-06-07 00:23 0 0 0 flushed
2020-06-07 00:24 0 39.3 0 third flush
2020-06-07 00:25 0 39.2 0 third flush
2020-06-07 00:26 0 40.3 0 third flush
2020-06-07 00:27 0 0 0 flushed and stopped
2020-06-07 00:28 0 0 0 flushed and stopped
2020-06-07 00:29 0 0 0 flushed and stopped

有什么建议吗?

我已经实现了基于“当前”和“流量”列的简单状态机:

def state_machine():
    current_state = None
    current, flow = yield

    while True:
        c, flow = yield current_state

        current_ddt = c - current
        current = c

        if current > 4.5:
            if current_ddt <= -1:
                current_state = "Ramping down"
            else:
                current_state = "Running"
        elif current > 4:
            if current_ddt < -1:
                current_state = "Shutdown"
            else:
                current_state = "Ramp down complete between 4-4.5"
        elif current > 0:
            current_state = "Shutdown"
        else:
            states = iter(
                [
                    "Purging",
                    "Purged",
                    "Second Flush",
                    "Flushed",
                    "Third Flush",
                    "Flushed and stopped",
                ]
            )

            # current is == 0, check the flow:
            if flow == 0:
                current_state = "De-energized"
                waiting_for_zero = False
            else:
                current_state = next(states)  # Purging
                waiting_for_zero = True

            while True:
                current, flow = yield current_state

                if flow > 0 and waiting_for_zero is False:
                    current_state = next(states)
                    waiting_for_zero = True
                elif flow == 0 and waiting_for_zero is True:
                    current_state = next(states)
                    waiting_for_zero = False

                if current_state == "Flushed and stopped":
                    # We are stopped completely, don't react to changes of "current" and/or "flow"
                    while True:
                        yield current_state


s = state_machine()
next(s)

df["Progress in shutdown cycle"] = df.apply(
    lambda x: s.send((x["Current"], x["Flow"])), axis=1
)

print(df)

打印:

       Date and Time  Current  Flow        Progress in shutdown cycle
0   2020-06-07 00:00     16.2  39.8                              None
1   2020-06-07 00:01     15.1  40.3                      Ramping down
2   2020-06-07 00:02     13.8  40.2                      Ramping down
3   2020-06-07 00:03     12.0  40.1                      Ramping down
4   2020-06-07 00:04     11.9  40.3                           Running
5   2020-06-07 00:05     12.1  39.8                           Running
6   2020-06-07 00:06     10.8  40.1                      Ramping down
7   2020-06-07 00:07      9.8  40.2                      Ramping down
8   2020-06-07 00:08      8.3  40.4                      Ramping down
9   2020-06-07 00:09      6.2  39.6                      Ramping down
10  2020-06-07 00:10      4.3  40.0                          Shutdown
11  2020-06-07 00:11      4.2  39.3  Ramp down complete between 4-4.5
12  2020-06-07 00:12      4.2  40.7  Ramp down complete between 4-4.5
13  2020-06-07 00:13      3.3  38.9                          Shutdown
14  2020-06-07 00:14      1.8  39.3                          Shutdown
15  2020-06-07 00:15      0.0   0.0                      De-energized
16  2020-06-07 00:16      0.0   0.0                      De-energized
17  2020-06-07 00:17      0.0  39.3                           Purging
18  2020-06-07 00:18      0.0  39.2                           Purging
19  2020-06-07 00:19      0.0   0.0                            Purged
20  2020-06-07 00:20      0.0   0.0                            Purged
21  2020-06-07 00:21      0.0  38.9                      Second Flush
22  2020-06-07 00:22      0.0  38.7                      Second Flush
23  2020-06-07 00:23      0.0   0.0                           Flushed
24  2020-06-07 00:24      0.0  39.3                       Third Flush
25  2020-06-07 00:25      0.0  39.2                       Third Flush
26  2020-06-07 00:26      0.0  40.3                       Third Flush
27  2020-06-07 00:27      0.0   0.0               Flushed and stopped
28  2020-06-07 00:28      0.0   0.0               Flushed and stopped
29  2020-06-07 00:29      0.0   0.0               Flushed and stopped