值更改时的标签行 pandas

Label row when value changes pandas

我需要以下问题的解决方案。我拥有的是一个时间戳和一个值。该值可以变为正值、负值或保持稳定。一旦它从一行到另一行发生积极变化或保持稳定,我想在新列中添加一个标签。如果该值继续增加,则应将相同的标签添加到该行。一旦值发生负变化,就应输入零作为标签。谁能帮帮我?

输入数据

df_raw = pd.DataFrame(
    {
        "timestamp": [
            "2017-06-16 05:19:18.993",
            "2017-06-16 05:19:28.993",
            "2017-06-16 05:19:38.993",
            "2017-06-16 05:19:48.993",
            "2017-06-16 05:19:58.993",
            "2017-06-16 05:25:08.993",
            "2017-06-16 05:25:18.993",
            "2017-06-16 07:44:28.993",
            "2017-06-16 07:45:38.993",
        ],
        "signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
    }
)

    timestamp                signalvalue
0   2017-06-16 05:19:18.993  0.0
1   2017-06-16 05:19:28.993  12.0
2   2017-06-16 05:19:38.993  22.0
3   2017-06-16 05:19:48.993  13.0
4   2017-06-16 05:19:58.993  0.0
5   2017-06-16 05:25:08.993  30.0
6   2017-06-16 05:25:18.993  0.0
7   2017-06-16 07:44:28.993  3.0
8   2017-06-16 07:45:38.993  6.0

期望的输出

    timestamp                signalvalue    label
0   2017-06-16 05:19:18.993  0.0            0
1   2017-06-16 05:19:28.993  12.0           1
2   2017-06-16 05:19:38.993  22.0           1
3   2017-06-16 05:19:48.993  13.0           0
4   2017-06-16 05:19:58.993  0.0            0
5   2017-06-16 05:25:08.993  30.0           2
6   2017-06-16 05:25:18.993  0.0            0
7   2017-06-16 07:44:28.993  3.0            3
8   2017-06-16 07:45:38.993  6.0            3

您可以使用以下函数来完成:

def increment_method_1(df,name):
    Results=[]
    last_result=0
    prev_val=0
    for val in df[name].values:
        if val==0 or (val>0 and prev_val>=val):
            Results.append(0)
        elif prev_val<val and prev_val!=0:
            Results.append(last_result)
        elif prev_val<val and prev_val==0:
            last_result+=1
            Results.append(last_result)
        else:
            print(prev_val,val,last_result)
            print("Unexpected condition")
        prev_val=val
    return Results

我假设您期望输出类似于以下代码片段。

import pandas as pd
import numpy as np

df_raw = pd.DataFrame(
    {
        "timestamp": [
            "2017-06-16 05:19:18.993",
            "2017-06-16 05:19:28.993",
            "2017-06-16 05:19:38.993",
            "2017-06-16 05:19:48.993",
            "2017-06-16 05:19:58.993",
            "2017-06-16 05:25:08.993",
            "2017-06-16 05:25:18.993",
            "2017-06-16 07:44:28.993",
            "2017-06-16 07:45:38.993",
        ],
        "signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
    }
)

modified = np.zeros((len(df_raw),)).astype(int)
positive = 0 

for i in range(1, len(df_raw)):
    if df_raw["signalvalue"][i] > df_raw["signalvalue"][i - 1]:
        if modified[i - 1] == 0:
            positive += 1
            modified[i] = positive
        else:
            modified[i] = positive

    
df_raw['label'] = modified

您可以根据连续值的 diff 计算掩码(如果大于零)。然后仅保留每个拉伸的第一项以计算 cumsum:

m1= df_raw['signalvalue'].diff().gt(0)

df_raw['label'] = (m1&m1.ne(m1.shift())).cumsum()*m1.astype(int)

输出:

                 timestamp  signalvalue  label
0  2017-06-16 05:19:18.993          0.0      0
1  2017-06-16 05:19:28.993         12.0      1
2  2017-06-16 05:19:38.993         22.0      1
3  2017-06-16 05:19:48.993         13.0      0
4  2017-06-16 05:19:58.993          0.0      0
5  2017-06-16 05:25:08.993         30.0      2
6  2017-06-16 05:25:18.993          0.0      0
7  2017-06-16 07:44:28.993          3.0      3
8  2017-06-16 07:45:38.993          6.0      3