使用 SciPy 中的 find_peaks 绘制峰值以检测剧烈的 up/down 转弯或全局异常值的问题

Problem with plotting peaks using find_peaks from SciPy to detect drastic up/down turns or global outliers

假设我有以下数据框包含 value 随着时间的推移或 date:

import pandas as pd

df = pd.DataFrame(data={'date':['2020-10-16','2020-10-17','2020-10-18','2020-10-19','2020-10-20','2020-10-21','2020-10-22','2020-10-23','2020-10-24','2020-10-25','2020-10-26','2020-10-27','2020-10-28','2020-10-29','2020-10-30','2020-10-31','2020-11-01','2020-11-02','2020-11-03','2020-11-04','2020-11-05','2020-11-06','2020-11-07','2020-11-08','2020-11-09','2020-11-10','2020-11-11','2020-11-12','2020-11-13','2020-11-14','2020-11-15'],
                        'value':[161967, 161270, 148508, 152442, 157504, 157118, 155674, 134522, 213384, 163242, 217415, 221502, 146267, 143621, 145875, 139488, 104466, 94825, 143686, 151952, 161074, 161417, 135042, 148768, 131428, 127816, 151905, 180498, 177899, 193950, 12]})
df

我受此启发 通过以下代码检测波峰和波谷:

from scipy.signal import find_peaks
import numpy as np
import matplotlib.pyplot as plt

# Input signal
t = df.date
x = df.value

# Threshold value (for height of peaks and valleys)
thresh = 0.95

# Find indices of peaks
peak_idx, _ = find_peaks(x, height=thresh, distance=10)

# Find indices of valleys (from inverting the signal)
valley_idx, _ = find_peaks(-x, height=thresh, distance=10 )

# Plot signal
plt.figure(figsize=(14,12))
plt.plot(t, x   , color='b', label='data')
plt.scatter(t, x, s=10,c='b',label='value')

# Plot threshold
plt.plot([min(t), max(t)], [thresh, thresh],   '--',  color='r', label='peaks-threshold')
plt.plot([min(t), max(t)], [-thresh, -thresh], '--',  color='g', label='valleys-threshold')

# Plot peaks (red) and valleys (blue)
plt.plot(t[peak_idx], x[peak_idx],     "x", color='r', label='peaks')
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')

plt.xticks(rotation=45)
plt.ylabel('value')
plt.xlabel('timestamp')
plt.title(f'data over time for username=target')
plt.legend( loc='upper left')
plt.gcf().autofmt_xdate()
plt.show()
plt.show()

这是输出:

问题:

  1. 您需要在与数据相同的域中指定高度
  2. 上阈值没有丢失,它在图上,只是所有这些线都接近 0 并且底部杂乱无章。
thresh_top = np.median(x) + 1 * np.std(x)
thresh_bottom = np.median(x) - 1 * np.std(x)
# (you may want to use std calculated on 10-90 percentile data, without outliers)

# Find indices of peaks
peak_idx, _ = find_peaks(x, height=thresh_top)

# Find indices of valleys (from inverting the signal)
valley_idx, _ = find_peaks(-x, height=-thresh_bottom)

# Plot signal
plt.figure(figsize=(14,12))
plt.plot(t, x   , color='b', label='data')
plt.scatter(t, x, s=10,c='b',label='value')

# Plot threshold
plt.plot([min(t), max(t)], [thresh_top, thresh_top],   '--',  color='r', label='peaks-threshold')
plt.plot([min(t), max(t)], [thresh_bottom, thresh_bottom], '--',  color='g', label='valleys-threshold')

# Plot peaks (red) and valleys (blue)
plt.plot(t[peak_idx], x[peak_idx],     "x", color='r', label='peaks')
plt.plot(t[valley_idx], x[valley_idx], "x", color='g', label='valleys')

plt.xticks(rotation=45)
plt.ylabel('value')
plt.xlabel('timestamp')
plt.title(f'data over time for username=target')
plt.legend( loc='upper left')
plt.gcf().autofmt_xdate()
plt.show()