为什么 find_peaks() 无法正确确定波形的最高峰?

Why is find_peaks() not working correctly for determining highest peaks of a waveform?

我正在尝试使用 scipy.signal 中的 find_peaks() 在 Python 中复制 MATLAB 函数 findpeaks()

基本上我正在尝试将 MATLAB 示例 for Finding Periodicity Using Autocorrelation 翻译成 Python。

我为此编写了以下 Python 代码。 一切似乎都工作正常,除了最后一部分 'long period' 的索引,即最高峰的索引,没有被正确确定。

#Loading Libraries

import numpy as np
import pandas as pd
import pickle
import scipy
from scipy.signal import find_peaks, square
import scipy.signal as signal
import matplotlib.pyplot as plt
import math

#Loading Dataset from a local copy of the dataset (from the MATLAB link I've shared)
dataset = pd.read_csv('officetemp_matlab_dataset.csv')

#Preprocessing
temp = dataset.to_numpy()
tempC = (temp-32)*5/9
tempnorm = tempC-np.mean(tempC)
fs = 2*24
t = [(i-1)/fs for i in range(len(tempnorm))]

#Plotting the waveform
plt.plot(t, tempnorm)

#Determining Autocorrelation & Lags
autocorr = signal.correlate(tempnorm, tempnorm, mode='same')
lags = signal.correlation_lags(len(tempnorm), len(tempnorm), mode="same")

#Plotting the Autocorrelation & Lags
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)

#A) FINDING ALL PEAKS

#1) Finding peak indices
indices = find_peaks(autocorr.flatten())[0]

#2) Finding peak values
peak_values_short = [autocorr.flatten()[j] for j in indices]

#3) Finding corresponding lags of the peak values
peak_values_lags_short = [lags.flatten()[j] for j in indices]

#4) Determining Period (short)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
short_period = abs(np.mean(diff))/fs
short_period 

#B) FINDING THE HIGHEST PEAKS (of 2nd period)

#1) Finding peak indices
indices = find_peaks(autocorr.flatten(), height = 0.3, distance = math.ceil(short_period)*fs)[0]

#2) Finding peak values
peak_values_long = [autocorr.flatten()[j] for j in indices]

#3) Finding corresponding lags of the peak values
peak_values_lags_long = [lags.flatten()[j] for j in indices]

#4) Determining Period (long)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
long_period = abs(np.mean(diff))/fs
long_period 


###DOING A SCATTER PLOT OF THE PEAK POINTS OVERLAPPING ON THE PREVIOUS PLOT OF AUTOCORR VS LAGS

f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)

shrt = [i/fs for i in peak_values_lags_short]
lng = [i/fs for i in peak_values_lags_long]

plt.scatter(shrt, peak_values_short, marker='o')
plt.scatter(lng, peak_values_long, marker='*')

如您所见,与 MATLAB 示例相比,我的 Python 输出有两处错误:

  1. 得到的'long time period'值(和它们的指数值)不同
  2. 长时间段”峰值位置的自校正和滞后值不同(如上图所示):

我不明白为什么 find_peaks() 第一次(确定所有峰值时)工作正常,但在提供更多参数以查找最高峰时第二次未能给出正确的结果.

如何正确检测第二个周期的最高峰?

我在回答我自己的问题。

我意识到我在 Python 代码中犯的唯一错误是没有像在 Matlab 示例中那样规范化自动校正值。我只是在我的代码中添加了以下内容:

autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))

当我这样做时,我最终得到了想要的结果,与示例中的结果相同:

因此,总而言之,find_peaks() 实际上完成了预期的工作。