在 Python 中的实时数据数组中检测模式

Question

我正在尝试检测实时数据（时间序列）中的特定模式。为了可视化，我将在此处分两部分显示数据。

Pattern: 我正在尝试按时间序列搜索，

DataWindow: data buffer(window) 我是实时滑动记录历史的

这是我记录的数据（红色框显示我想要检测的模式），但这可能会有所不同，因为它是实时的：

上面的数据没有太多噪音（至少对于这个集合而言）- 就我所看到的分辨率而言，峰值（也许我会说正弦峰值）乍一看是可以区分的。这就是为什么应用移动平均过滤器对我一点帮助都没有。

下图显示了实时数据的一些样本，但在保存的数据中，绘图仪应用外推法绘制连续图。一般来说，数据样本看起来像下面的图像，或者可能比这个图像具有更高的分辨率。

对于最初的开始，我已经尝试 Spike Detection in a Time-Series 使用移动平均线，但没有像我预期的那样工作。我也从这个线程尝试了一些解决方案，但结果不足以让我在运行期间在模式中举起一个标志（有很多误报）

此外，您可能会从保存的实时数据中意识到，模式可以有不同的 scale，最重要的是可以有不同的 offset.这就是我想对我的问题应用上述解决方案以获得可区分结果的问题。

举一些例子来尝试，这些可以用于 Pattern 和 DataWindow Pattern = [5.9, 5.6, 4.08, 2.57, 2.78, 4.78, 7.3, 7.98, 4.81, 5.57, 4.7]

SampleTarget = [4.74, 4.693, 4.599, 4.444, 3.448, 2.631, 1.845, 2.032, 2.415, 3.714, 5.184, 5.82, 5.61, 4.841, 3.802, 3.11]

SampleTarget2 = [5.898, 5.91, 5.62, 5.25, 4.72, 4.09, 3.445, 2.91, 2.7, 2.44, 2.515, 2.79, 3.25, 3.915,4.72, 5.65, 6.28, 7.15, 7.81, 8.2, 7.9, 7.71, 7.32, 6.88, 6.44, 6.0,5.58, 5.185, 4.88, 4.72, 4.69, 4.82]

我正尝试在 Python 上为 PoC 解决这个问题。更新：添加了数据集，包括前两个红色框和稍宽的边，显示在保存的实时数据中。dataset

Answer 1

您可以计算数据的梯度并使用阈值来识别特征。这里我使用三重掩码来获得 down/up/down 特征。

我对代码进行了注释，为您提供了主要步骤，希望它是全面的。

import pandas as pd
import matplotlib.pyplot as plt

# read data
s = pd.read_csv('sin_peaks.txt', header=None)[0]
# 0    5.574537
# 1    5.736071
# 2    5.965132
# 3    6.164344
# 4    6.172413

thresh = 0.5 # threshold of derivative
span = 10    # max span of the feature (in number of points)

# calculate gradient
# if the points are not evenly spaced
# you should also divide by the spacing
s2 = s.diff()

# get points outside of threshold
m1 = s2.lt(-thresh)
m2 = s2.gt(thresh)

# extend masks
m1_fw = m1.where(m1).ffill(limit=span)
m1_bw = m1.where(m1).bfill(limit=span)
m2_fbw = m2.where(m2).ffill(limit=span).bfill(limit=span)

# slice data where all conditions are met
# up peak & down peak in the "span" before and down peak in the "span" after
peaks = s[m1_fw & m1_bw & m2_fbw]

# group peaks
groups = peaks.index.to_series().diff().ne(1).cumsum()

# plot identified features
ax = s.plot(label='data')
s.diff().plot(ax=ax, label='gradient')
ax.legend()

ax.axhline(thresh, ls=':', c='k')
ax.axhline(-thresh, ls=':', c='k')

for _, group in peaks.groupby(groups):
    start = group.index[0]
    stop = group.index[-1]
    ax.axvspan(start, stop, color='k', alpha=0.1)

在 Python 中的实时数据数组中检测模式

Detecting Pattern in Real Time Data array in Python

python

signal-processing

numpy

python-3.x

pandas