如何在 python 中的 0 点之间找到时间序列数据中的最高峰?
How to find highest peaks in time series data between points of 0 in python?
我正在尝试获取我的时间序列数据并隔离 0 点之间的所有数据,然后确定那些具有最高峰的间隔。我在 python.
工作
参考这张图:
time series data with peaks and valleys identified
来源:https://tcoil.info/find-peaks-and-valleys-in-dataset-with-python/
注意到第一个和最后一个红色谷点在 0 处,我想找到一种方法来获取时间序列数据,识别 y 轴上所有 0 处的点,然后隔离它们之间的数据。对于我链接到此处的图表,我想隔离第一个和最后一个红色谷点之间的所有数据。我想在整个时间序列数据集中执行此操作,其中 y 轴上 0 点之间的数据被隔离。既然这些间隔是孤立的(在整个数据中代表不同的 events/cycles),我想记录每个间隔内的最高点。然后我想找到具有 5 个最高峰值的间隔(每个间隔一个峰值)。最后,我想输出包含这前 5 个峰值的区间(或范围)。对于上下文,这些间隔中的每一个都代表一个 event/cycle,我想找到最极端的。因此,我想要一个输出,基本上告诉我最极端的 event/cycle 发生在 2020 年 3 月 5 日和 20 年 3 月 24 日之间。
如何在 python 中完成此操作?我需要先平滑数据吗?我将如何隔离 y 轴上 0 点之间的数据?我在想先走哪个方向,还没有代码。
让我们使用您引用的数据。我会在评论中添加详细的解释。
x = np.linspace(-1, 3, 1000)
y = -0.1 * np.cos(12*x) + np.exp(-(1-x)**2)
I want to find a way to take time series data, identify all points at 0 on the y-axis, and then isolate the data in between
所以基本上你想要分隔 y 轴上方和 y 轴下方的连续点。基于this answer,你可以这样做:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
# Find all consecutive chunks that are above y=0
for start, stop in contiguous_regions(y > 0):
ax.plot(x[start:stop], y[start:stop], color='red')
# Find all consecutive chunks that are below y=0
for start, stop in contiguous_regions(y < 0):
ax.plot(x[start:stop], y[start:stop], color='blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
如您所见,蓝点在 y 轴下方,红点在上方。他们肯定是孤立的。
For the graph I linked to here, I would want to isolate all data between the first and last red valley point.
你也可以这样做。对于每个块,我们需要找到它的谷,您可以通过查看您发送给我们的 link 来完成!
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='red')
# Find all the valleys
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
# If there's more than two valleys (the first and the last)
if valleys.size > 2:
# Get'em!
iv0, *_, iv1 = valleys
# Plot'em!
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='red', linewidth=4)
for start, stop in contiguous_regions(y < 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='blue')
# The same.
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='blue', linewidth=4)
ax.axhline(0, color='grey')
plt.show()
plt.close()
看到那条大红线了吗?这是我们的,没错。
我们开始在这里重复自己。让我们做一个函数:
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color=color, linewidth=4)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
更好,甚至更好:相同的情节。接下来是什么?
Now that those intervals are isolated (representing different events/cycles throughout the data), I want to record the highest point within each of these intervals.
但这太容易了。让我们像裘德一样,who probably made it.
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
# Get the index of the maximum value in this trim
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
你看到那边那个小点了吗?这也是我们的。最大峰值。
Then I want to find the intervals with the 5 highest peaks (one peak per each interval)
好吧,那更难了。让我们创建一些列表以便我们可以存储它们!
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
# Return the x, y of the peak
return x_trim[ip], y_trim[ip]
return None
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
intervals = []
for start, stop in contiguous_regions(y > 0):
# Receive it here
peak = do_chunk(x[start:stop], y[start:stop], 'red')
# If this data contains at least two valleys
if peak is not None:
# Let's use a Javascript favorite to store data: JSONs
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
for start, stop in contiguous_regions(y < 0):
peak = do_chunk(x[start:stop], y[start:stop], 'blue')
if peak is not None:
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
ax.axhline(0, color='grey')
plt.show()
plt.close()
那么,intervals
里面有什么?让我们来看看!哦,我查过这里。它给
[{'start': 121, 'stop': 892, 'peak': (0.8098098098098099, 1.0602140027371494)}]
这是什么意思?这意味着,从索引 121
到索引 892
,找到的最高峰位于 x=0.809 和 y=1.060。太好了,是吧?由于使用的数据只有一个峰,那就是他
要找到最高的 y 峰,只需进行列表推导即可:
# High five!
high_five = sorted( # Sort it, so the highest peaks will be on the list tail
[(interval["start"], interval["stop"]) for interval in intervals],
key=lambda interval: interval["peak"][1], # Filter by the y-value of its peak
)[:-5] # Get the last five
Lastly, I want to output the interval (or range) that contains these top 5 peaks.
现在很容易,但我会把它留给你。相信我,最糟糕的部分已经完成。
我正在尝试获取我的时间序列数据并隔离 0 点之间的所有数据,然后确定那些具有最高峰的间隔。我在 python.
工作参考这张图: time series data with peaks and valleys identified
来源:https://tcoil.info/find-peaks-and-valleys-in-dataset-with-python/
注意到第一个和最后一个红色谷点在 0 处,我想找到一种方法来获取时间序列数据,识别 y 轴上所有 0 处的点,然后隔离它们之间的数据。对于我链接到此处的图表,我想隔离第一个和最后一个红色谷点之间的所有数据。我想在整个时间序列数据集中执行此操作,其中 y 轴上 0 点之间的数据被隔离。既然这些间隔是孤立的(在整个数据中代表不同的 events/cycles),我想记录每个间隔内的最高点。然后我想找到具有 5 个最高峰值的间隔(每个间隔一个峰值)。最后,我想输出包含这前 5 个峰值的区间(或范围)。对于上下文,这些间隔中的每一个都代表一个 event/cycle,我想找到最极端的。因此,我想要一个输出,基本上告诉我最极端的 event/cycle 发生在 2020 年 3 月 5 日和 20 年 3 月 24 日之间。
如何在 python 中完成此操作?我需要先平滑数据吗?我将如何隔离 y 轴上 0 点之间的数据?我在想先走哪个方向,还没有代码。
让我们使用您引用的数据。我会在评论中添加详细的解释。
x = np.linspace(-1, 3, 1000)
y = -0.1 * np.cos(12*x) + np.exp(-(1-x)**2)
I want to find a way to take time series data, identify all points at 0 on the y-axis, and then isolate the data in between
所以基本上你想要分隔 y 轴上方和 y 轴下方的连续点。基于this answer,你可以这样做:
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
# Find all consecutive chunks that are above y=0
for start, stop in contiguous_regions(y > 0):
ax.plot(x[start:stop], y[start:stop], color='red')
# Find all consecutive chunks that are below y=0
for start, stop in contiguous_regions(y < 0):
ax.plot(x[start:stop], y[start:stop], color='blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
如您所见,蓝点在 y 轴下方,红点在上方。他们肯定是孤立的。
For the graph I linked to here, I would want to isolate all data between the first and last red valley point.
你也可以这样做。对于每个块,我们需要找到它的谷,您可以通过查看您发送给我们的 link 来完成!
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='red')
# Find all the valleys
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
# If there's more than two valleys (the first and the last)
if valleys.size > 2:
# Get'em!
iv0, *_, iv1 = valleys
# Plot'em!
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='red', linewidth=4)
for start, stop in contiguous_regions(y < 0):
x_chunk, y_chunk = x[start:stop], y[start:stop]
ax.plot(x_chunk, y_chunk, color='blue')
# The same.
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color='blue', linewidth=4)
ax.axhline(0, color='grey')
plt.show()
plt.close()
看到那条大红线了吗?这是我们的,没错。
我们开始在这里重复自己。让我们做一个函数:
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
ax.plot(x_chunk[iv0:iv1], y_chunk[iv0:iv1], color=color, linewidth=4)
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
更好,甚至更好:相同的情节。接下来是什么?
Now that those intervals are isolated (representing different events/cycles throughout the data), I want to record the highest point within each of these intervals.
但这太容易了。让我们像裘德一样,who probably made it.
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
# Get the index of the maximum value in this trim
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
for start, stop in contiguous_regions(y > 0):
do_chunk(x[start:stop], y[start:stop], 'red')
for start, stop in contiguous_regions(y < 0):
do_chunk(x[start:stop], y[start:stop], 'blue')
ax.axhline(0, color='grey')
plt.show()
plt.close()
你看到那边那个小点了吗?这也是我们的。最大峰值。
Then I want to find the intervals with the 5 highest peaks (one peak per each interval)
好吧,那更难了。让我们创建一些列表以便我们可以存储它们!
def do_chunk(x_chunk, y_chunk, color):
ax.plot(x_chunk, y_chunk, color=color)
valleys = (np.diff(np.sign(np.diff(y_chunk))) > 0).nonzero()[0] + 1
if valleys.size > 2:
iv0, *_, iv1 = valleys
x_trim, y_trim = x_chunk[iv0:iv1], y_chunk[iv0:iv1]
ax.plot(x_trim, y_trim, color=color, linewidth=4)
ip = np.argmax(y_trim)
ax.scatter(x_trim[ip], y_trim[ip], color='blue')
# Return the x, y of the peak
return x_trim[ip], y_trim[ip]
return None
fig, ax = plt.subplots(figsize=(10, 5))
ax.plot(x, y, color='black')
intervals = []
for start, stop in contiguous_regions(y > 0):
# Receive it here
peak = do_chunk(x[start:stop], y[start:stop], 'red')
# If this data contains at least two valleys
if peak is not None:
# Let's use a Javascript favorite to store data: JSONs
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
for start, stop in contiguous_regions(y < 0):
peak = do_chunk(x[start:stop], y[start:stop], 'blue')
if peak is not None:
intervals.append({
'start': start,
'stop': stop,
'peak': peak,
})
ax.axhline(0, color='grey')
plt.show()
plt.close()
那么,intervals
里面有什么?让我们来看看!哦,我查过这里。它给
[{'start': 121, 'stop': 892, 'peak': (0.8098098098098099, 1.0602140027371494)}]
这是什么意思?这意味着,从索引 121
到索引 892
,找到的最高峰位于 x=0.809 和 y=1.060。太好了,是吧?由于使用的数据只有一个峰,那就是他
要找到最高的 y 峰,只需进行列表推导即可:
# High five!
high_five = sorted( # Sort it, so the highest peaks will be on the list tail
[(interval["start"], interval["stop"]) for interval in intervals],
key=lambda interval: interval["peak"][1], # Filter by the y-value of its peak
)[:-5] # Get the last five
Lastly, I want to output the interval (or range) that contains these top 5 peaks.
现在很容易,但我会把它留给你。相信我,最糟糕的部分已经完成。