Pandas 获取指定时间段内时间序列的最大增量
Pandas get max delta in a timeseries for a specified period
给定一个以非常规时间序列作为索引的数据帧,我想找到 10 秒时间段内值之间的最大增量。这是一些做同样事情的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
xs = np.cumsum(np.random.rand(200))
# This function is to create a general situation where the max is not aways at the end or beginning
ys = xs**1.2 + 10 * np.sin(xs)
plt.plot(xs, ys, '+-')
threshold = 10
xs_thresh_ind = np.zeros_like(xs, dtype=int)
deltas = np.zeros_like(ys)
for i, x in enumerate(xs):
# Find indices that lie within the time threshold
period_end_ind = np.argmax(xs > x + threshold)
# Only operate when the window is wide enough (this can be treated differently)
if period_end_ind > 0:
xs_thresh_ind[i] = period_end_ind
# Find extrema in the period
period_min = np.min(ys[i:period_end_ind + 1])
period_max = np.max(ys[i:period_end_ind + 1])
deltas[i] = period_max - period_min
max_ind_low = np.argmax(deltas)
max_ind_high = xs_thresh_ind[max_ind_low]
max_delta = deltas[max_ind_low]
print(
'Max delta {:.2f} is in period x[{}]={:.2f},{:.2f} and x[{}]={:.2f},{:.2f}'
.format(max_delta, max_ind_low, xs[max_ind_low], ys[max_ind_low],
max_ind_high, xs[max_ind_high], ys[max_ind_high]))
df = pd.DataFrame(ys, index=xs)
OUTPUT:
Max delta 48.76 is in period x[167]=86.10,200.32 and x[189]=96.14,249.09
是否有一种有效的 panadic 方法来实现类似的目标?
从 ys
值创建一个系列,由 xs
索引 - 但将 xs
转换为实际的 timedelta 元素,而不是等效的浮点数。
ts = pd.Series(ys, index=pd.to_timedelta(xs, unit="s"))
我们要应用领先的 10 秒 window,我们在其中计算最大值和最小值之间的差异。因为我们希望它领先,所以我们将按降序对系列进行排序并应用尾随 window.
deltas = ts.sort_index(ascending=False).rolling("10s").agg(lambda s: s.max() - s.min())
找到 deltas[deltas == deltas.max()]
的最大增量,得到
0 days 00:01:26.104797298 48.354851
意味着在区间 [86.1, 96.1)
中发现了 48.35 的增量
给定一个以非常规时间序列作为索引的数据帧,我想找到 10 秒时间段内值之间的最大增量。这是一些做同样事情的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
xs = np.cumsum(np.random.rand(200))
# This function is to create a general situation where the max is not aways at the end or beginning
ys = xs**1.2 + 10 * np.sin(xs)
plt.plot(xs, ys, '+-')
threshold = 10
xs_thresh_ind = np.zeros_like(xs, dtype=int)
deltas = np.zeros_like(ys)
for i, x in enumerate(xs):
# Find indices that lie within the time threshold
period_end_ind = np.argmax(xs > x + threshold)
# Only operate when the window is wide enough (this can be treated differently)
if period_end_ind > 0:
xs_thresh_ind[i] = period_end_ind
# Find extrema in the period
period_min = np.min(ys[i:period_end_ind + 1])
period_max = np.max(ys[i:period_end_ind + 1])
deltas[i] = period_max - period_min
max_ind_low = np.argmax(deltas)
max_ind_high = xs_thresh_ind[max_ind_low]
max_delta = deltas[max_ind_low]
print(
'Max delta {:.2f} is in period x[{}]={:.2f},{:.2f} and x[{}]={:.2f},{:.2f}'
.format(max_delta, max_ind_low, xs[max_ind_low], ys[max_ind_low],
max_ind_high, xs[max_ind_high], ys[max_ind_high]))
df = pd.DataFrame(ys, index=xs)
OUTPUT:
Max delta 48.76 is in period x[167]=86.10,200.32 and x[189]=96.14,249.09
是否有一种有效的 panadic 方法来实现类似的目标?
从 ys
值创建一个系列,由 xs
索引 - 但将 xs
转换为实际的 timedelta 元素,而不是等效的浮点数。
ts = pd.Series(ys, index=pd.to_timedelta(xs, unit="s"))
我们要应用领先的 10 秒 window,我们在其中计算最大值和最小值之间的差异。因为我们希望它领先,所以我们将按降序对系列进行排序并应用尾随 window.
deltas = ts.sort_index(ascending=False).rolling("10s").agg(lambda s: s.max() - s.min())
找到 deltas[deltas == deltas.max()]
的最大增量,得到
0 days 00:01:26.104797298 48.354851
意味着在区间 [86.1, 96.1)