python dataframe timeseries 检查最后 n 行和前 n 行中值的变化是否大于 x
python dataframe timeseries check if value changed more than x in last n rows and forward n rows
我有来自外地的阳光数据。我正在检查阳光在过去 1 分钟和未来 1 分钟内的变化是否超过某个值。下面我举个例子。我正在检查过去 10 秒内数据值的变化是否超过 4。
代码:
xdf = pd.DataFrame({'data':np.random.randint(10,size=10)},index=pd.date_range('2022-06-03 00:00:00', '2022-06-03 00:00:45', freq='5s'))
# here data frequency 5s, so, to check last 10s
# I have to consider present row and last 2 rows
# Perform rolling max and min value for 3 rows
nrows = 3
# Allowable change
ac = 4
xdf['back_max'] = xdf['data'].rolling(nrows).max()
xdf['back_min'] = xdf['data'].rolling(nrows).min()
xdf['back_min_max_dif'] = (xdf['back_max'] - xdf['back_min'])
xdf['back_<4'] = (xdf['back_max'] - xdf['back_min']).abs().le(ac)
print(xdf)
## Again repeat the above for the future nrows
## Don't know how?
预期输出:
data back_max back_min back_min_max_dif back_<4
2022-06-03 00:00:00 7 NaN NaN NaN False
2022-06-03 00:00:05 7 NaN NaN NaN False
2022-06-03 00:00:10 5 7.0 5.0 2.0 True
2022-06-03 00:00:15 8 8.0 5.0 3.0 True
2022-06-03 00:00:20 6 8.0 5.0 3.0 True
2022-06-03 00:00:25 2 8.0 2.0 6.0 False
2022-06-03 00:00:30 3 6.0 2.0 4.0 True
2022-06-03 00:00:35 1 3.0 1.0 2.0 True
2022-06-03 00:00:40 5 5.0 1.0 4.0 True
2022-06-03 00:00:45 5 5.0 1.0 4.0 True
有什么办法可以简化上面的程序吗?另外,我必须对未来的 nrows 执行滚动最大值,如何做?
对于future/forwardroll,可以在反转的数据上roll。这可能不适用于 time-window roll:
rolling = xdf['data'].rolling(nrows)
xdf['pass_<'] = (rolling.max()-rolling.min()).le(ac)
future_roll = xdf['data'][::-1].rolling(nrows)
xdf['future_<'] = future_roll.max().sub(future_roll.min()).le(ac)
输出:
data pass_< future_<
2022-06-03 00:00:00 7 False True
2022-06-03 00:00:05 7 False True
2022-06-03 00:00:10 5 True True
2022-06-03 00:00:15 8 True False
2022-06-03 00:00:20 6 True True
2022-06-03 00:00:25 2 False True
2022-06-03 00:00:30 3 True True
2022-06-03 00:00:35 1 True True
2022-06-03 00:00:40 5 True False
2022-06-03 00:00:45 5 True False
我有来自外地的阳光数据。我正在检查阳光在过去 1 分钟和未来 1 分钟内的变化是否超过某个值。下面我举个例子。我正在检查过去 10 秒内数据值的变化是否超过 4。 代码:
xdf = pd.DataFrame({'data':np.random.randint(10,size=10)},index=pd.date_range('2022-06-03 00:00:00', '2022-06-03 00:00:45', freq='5s'))
# here data frequency 5s, so, to check last 10s
# I have to consider present row and last 2 rows
# Perform rolling max and min value for 3 rows
nrows = 3
# Allowable change
ac = 4
xdf['back_max'] = xdf['data'].rolling(nrows).max()
xdf['back_min'] = xdf['data'].rolling(nrows).min()
xdf['back_min_max_dif'] = (xdf['back_max'] - xdf['back_min'])
xdf['back_<4'] = (xdf['back_max'] - xdf['back_min']).abs().le(ac)
print(xdf)
## Again repeat the above for the future nrows
## Don't know how?
预期输出:
data back_max back_min back_min_max_dif back_<4
2022-06-03 00:00:00 7 NaN NaN NaN False
2022-06-03 00:00:05 7 NaN NaN NaN False
2022-06-03 00:00:10 5 7.0 5.0 2.0 True
2022-06-03 00:00:15 8 8.0 5.0 3.0 True
2022-06-03 00:00:20 6 8.0 5.0 3.0 True
2022-06-03 00:00:25 2 8.0 2.0 6.0 False
2022-06-03 00:00:30 3 6.0 2.0 4.0 True
2022-06-03 00:00:35 1 3.0 1.0 2.0 True
2022-06-03 00:00:40 5 5.0 1.0 4.0 True
2022-06-03 00:00:45 5 5.0 1.0 4.0 True
有什么办法可以简化上面的程序吗?另外,我必须对未来的 nrows 执行滚动最大值,如何做?
对于future/forwardroll,可以在反转的数据上roll。这可能不适用于 time-window roll:
rolling = xdf['data'].rolling(nrows)
xdf['pass_<'] = (rolling.max()-rolling.min()).le(ac)
future_roll = xdf['data'][::-1].rolling(nrows)
xdf['future_<'] = future_roll.max().sub(future_roll.min()).le(ac)
输出:
data pass_< future_<
2022-06-03 00:00:00 7 False True
2022-06-03 00:00:05 7 False True
2022-06-03 00:00:10 5 True True
2022-06-03 00:00:15 8 True False
2022-06-03 00:00:20 6 True True
2022-06-03 00:00:25 2 False True
2022-06-03 00:00:30 3 True True
2022-06-03 00:00:35 1 True True
2022-06-03 00:00:40 5 True False
2022-06-03 00:00:45 5 True False