值更改时的标签行 pandas
Label row when value changes pandas
我需要以下问题的解决方案。我拥有的是一个时间戳和一个值。该值可以变为正值、负值或保持稳定。一旦它从一行到另一行发生积极变化或保持稳定,我想在新列中添加一个标签。如果该值继续增加,则应将相同的标签添加到该行。一旦值发生负变化,就应输入零作为标签。谁能帮帮我?
输入数据
df_raw = pd.DataFrame(
{
"timestamp": [
"2017-06-16 05:19:18.993",
"2017-06-16 05:19:28.993",
"2017-06-16 05:19:38.993",
"2017-06-16 05:19:48.993",
"2017-06-16 05:19:58.993",
"2017-06-16 05:25:08.993",
"2017-06-16 05:25:18.993",
"2017-06-16 07:44:28.993",
"2017-06-16 07:45:38.993",
],
"signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
}
)
timestamp signalvalue
0 2017-06-16 05:19:18.993 0.0
1 2017-06-16 05:19:28.993 12.0
2 2017-06-16 05:19:38.993 22.0
3 2017-06-16 05:19:48.993 13.0
4 2017-06-16 05:19:58.993 0.0
5 2017-06-16 05:25:08.993 30.0
6 2017-06-16 05:25:18.993 0.0
7 2017-06-16 07:44:28.993 3.0
8 2017-06-16 07:45:38.993 6.0
期望的输出
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3
您可以使用以下函数来完成:
def increment_method_1(df,name):
Results=[]
last_result=0
prev_val=0
for val in df[name].values:
if val==0 or (val>0 and prev_val>=val):
Results.append(0)
elif prev_val<val and prev_val!=0:
Results.append(last_result)
elif prev_val<val and prev_val==0:
last_result+=1
Results.append(last_result)
else:
print(prev_val,val,last_result)
print("Unexpected condition")
prev_val=val
return Results
我假设您期望输出类似于以下代码片段。
import pandas as pd
import numpy as np
df_raw = pd.DataFrame(
{
"timestamp": [
"2017-06-16 05:19:18.993",
"2017-06-16 05:19:28.993",
"2017-06-16 05:19:38.993",
"2017-06-16 05:19:48.993",
"2017-06-16 05:19:58.993",
"2017-06-16 05:25:08.993",
"2017-06-16 05:25:18.993",
"2017-06-16 07:44:28.993",
"2017-06-16 07:45:38.993",
],
"signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
}
)
modified = np.zeros((len(df_raw),)).astype(int)
positive = 0
for i in range(1, len(df_raw)):
if df_raw["signalvalue"][i] > df_raw["signalvalue"][i - 1]:
if modified[i - 1] == 0:
positive += 1
modified[i] = positive
else:
modified[i] = positive
df_raw['label'] = modified
您可以根据连续值的 diff
计算掩码(如果大于零)。然后仅保留每个拉伸的第一项以计算 cumsum
:
m1= df_raw['signalvalue'].diff().gt(0)
df_raw['label'] = (m1&m1.ne(m1.shift())).cumsum()*m1.astype(int)
输出:
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3
我需要以下问题的解决方案。我拥有的是一个时间戳和一个值。该值可以变为正值、负值或保持稳定。一旦它从一行到另一行发生积极变化或保持稳定,我想在新列中添加一个标签。如果该值继续增加,则应将相同的标签添加到该行。一旦值发生负变化,就应输入零作为标签。谁能帮帮我?
输入数据
df_raw = pd.DataFrame(
{
"timestamp": [
"2017-06-16 05:19:18.993",
"2017-06-16 05:19:28.993",
"2017-06-16 05:19:38.993",
"2017-06-16 05:19:48.993",
"2017-06-16 05:19:58.993",
"2017-06-16 05:25:08.993",
"2017-06-16 05:25:18.993",
"2017-06-16 07:44:28.993",
"2017-06-16 07:45:38.993",
],
"signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
}
)
timestamp signalvalue
0 2017-06-16 05:19:18.993 0.0
1 2017-06-16 05:19:28.993 12.0
2 2017-06-16 05:19:38.993 22.0
3 2017-06-16 05:19:48.993 13.0
4 2017-06-16 05:19:58.993 0.0
5 2017-06-16 05:25:08.993 30.0
6 2017-06-16 05:25:18.993 0.0
7 2017-06-16 07:44:28.993 3.0
8 2017-06-16 07:45:38.993 6.0
期望的输出
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3
您可以使用以下函数来完成:
def increment_method_1(df,name):
Results=[]
last_result=0
prev_val=0
for val in df[name].values:
if val==0 or (val>0 and prev_val>=val):
Results.append(0)
elif prev_val<val and prev_val!=0:
Results.append(last_result)
elif prev_val<val and prev_val==0:
last_result+=1
Results.append(last_result)
else:
print(prev_val,val,last_result)
print("Unexpected condition")
prev_val=val
return Results
我假设您期望输出类似于以下代码片段。
import pandas as pd
import numpy as np
df_raw = pd.DataFrame(
{
"timestamp": [
"2017-06-16 05:19:18.993",
"2017-06-16 05:19:28.993",
"2017-06-16 05:19:38.993",
"2017-06-16 05:19:48.993",
"2017-06-16 05:19:58.993",
"2017-06-16 05:25:08.993",
"2017-06-16 05:25:18.993",
"2017-06-16 07:44:28.993",
"2017-06-16 07:45:38.993",
],
"signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
}
)
modified = np.zeros((len(df_raw),)).astype(int)
positive = 0
for i in range(1, len(df_raw)):
if df_raw["signalvalue"][i] > df_raw["signalvalue"][i - 1]:
if modified[i - 1] == 0:
positive += 1
modified[i] = positive
else:
modified[i] = positive
df_raw['label'] = modified
您可以根据连续值的 diff
计算掩码(如果大于零)。然后仅保留每个拉伸的第一项以计算 cumsum
:
m1= df_raw['signalvalue'].diff().gt(0)
df_raw['label'] = (m1&m1.ne(m1.shift())).cumsum()*m1.astype(int)
输出:
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3