为什么 find_peaks() 无法正确确定波形的最高峰?
Why is find_peaks() not working correctly for determining highest peaks of a waveform?
我正在尝试使用 scipy.signal
中的 find_peaks()
在 Python 中复制 MATLAB 函数 findpeaks()
。
基本上我正在尝试将 MATLAB 示例 for Finding Periodicity Using Autocorrelation 翻译成 Python。
我为此编写了以下 Python 代码。
一切似乎都工作正常,除了最后一部分 'long period' 的索引,即最高峰的索引,没有被正确确定。
#Loading Libraries
import numpy as np
import pandas as pd
import pickle
import scipy
from scipy.signal import find_peaks, square
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
#Loading Dataset from a local copy of the dataset (from the MATLAB link I've shared)
dataset = pd.read_csv('officetemp_matlab_dataset.csv')
#Preprocessing
temp = dataset.to_numpy()
tempC = (temp-32)*5/9
tempnorm = tempC-np.mean(tempC)
fs = 2*24
t = [(i-1)/fs for i in range(len(tempnorm))]
#Plotting the waveform
plt.plot(t, tempnorm)
#Determining Autocorrelation & Lags
autocorr = signal.correlate(tempnorm, tempnorm, mode='same')
lags = signal.correlation_lags(len(tempnorm), len(tempnorm), mode="same")
#Plotting the Autocorrelation & Lags
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)
#A) FINDING ALL PEAKS
#1) Finding peak indices
indices = find_peaks(autocorr.flatten())[0]
#2) Finding peak values
peak_values_short = [autocorr.flatten()[j] for j in indices]
#3) Finding corresponding lags of the peak values
peak_values_lags_short = [lags.flatten()[j] for j in indices]
#4) Determining Period (short)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
short_period = abs(np.mean(diff))/fs
short_period
#B) FINDING THE HIGHEST PEAKS (of 2nd period)
#1) Finding peak indices
indices = find_peaks(autocorr.flatten(), height = 0.3, distance = math.ceil(short_period)*fs)[0]
#2) Finding peak values
peak_values_long = [autocorr.flatten()[j] for j in indices]
#3) Finding corresponding lags of the peak values
peak_values_lags_long = [lags.flatten()[j] for j in indices]
#4) Determining Period (long)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
long_period = abs(np.mean(diff))/fs
long_period
###DOING A SCATTER PLOT OF THE PEAK POINTS OVERLAPPING ON THE PREVIOUS PLOT OF AUTOCORR VS LAGS
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)
shrt = [i/fs for i in peak_values_lags_short]
lng = [i/fs for i in peak_values_lags_long]
plt.scatter(shrt, peak_values_short, marker='o')
plt.scatter(lng, peak_values_long, marker='*')
如您所见,与 MATLAB 示例相比,我的 Python 输出有两处错误:
- 得到的'long time period'值(和它们的指数值)不同
- “长时间段”峰值位置的自校正和滞后值不同(如上图所示):
我不明白为什么 find_peaks()
第一次(确定所有峰值时)工作正常,但在提供更多参数以查找最高峰时第二次未能给出正确的结果.
如何正确检测第二个周期的最高峰?
我在回答我自己的问题。
我意识到我在 Python 代码中犯的唯一错误是没有像在 Matlab 示例中那样规范化自动校正值。我只是在我的代码中添加了以下内容:
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
当我这样做时,我最终得到了想要的结果,与示例中的结果相同:
因此,总而言之,find_peaks() 实际上完成了预期的工作。
我正在尝试使用 scipy.signal
中的 find_peaks()
在 Python 中复制 MATLAB 函数 findpeaks()
。
基本上我正在尝试将 MATLAB 示例 for Finding Periodicity Using Autocorrelation 翻译成 Python。
我为此编写了以下 Python 代码。 一切似乎都工作正常,除了最后一部分 'long period' 的索引,即最高峰的索引,没有被正确确定。
#Loading Libraries
import numpy as np
import pandas as pd
import pickle
import scipy
from scipy.signal import find_peaks, square
import scipy.signal as signal
import matplotlib.pyplot as plt
import math
#Loading Dataset from a local copy of the dataset (from the MATLAB link I've shared)
dataset = pd.read_csv('officetemp_matlab_dataset.csv')
#Preprocessing
temp = dataset.to_numpy()
tempC = (temp-32)*5/9
tempnorm = tempC-np.mean(tempC)
fs = 2*24
t = [(i-1)/fs for i in range(len(tempnorm))]
#Plotting the waveform
plt.plot(t, tempnorm)
#Determining Autocorrelation & Lags
autocorr = signal.correlate(tempnorm, tempnorm, mode='same')
lags = signal.correlation_lags(len(tempnorm), len(tempnorm), mode="same")
#Plotting the Autocorrelation & Lags
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)
#A) FINDING ALL PEAKS
#1) Finding peak indices
indices = find_peaks(autocorr.flatten())[0]
#2) Finding peak values
peak_values_short = [autocorr.flatten()[j] for j in indices]
#3) Finding corresponding lags of the peak values
peak_values_lags_short = [lags.flatten()[j] for j in indices]
#4) Determining Period (short)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
short_period = abs(np.mean(diff))/fs
short_period
#B) FINDING THE HIGHEST PEAKS (of 2nd period)
#1) Finding peak indices
indices = find_peaks(autocorr.flatten(), height = 0.3, distance = math.ceil(short_period)*fs)[0]
#2) Finding peak values
peak_values_long = [autocorr.flatten()[j] for j in indices]
#3) Finding corresponding lags of the peak values
peak_values_lags_long = [lags.flatten()[j] for j in indices]
#4) Determining Period (long)
diff = [(indices[i - 1] - x) for i, x in enumerate(indices)][1:]
long_period = abs(np.mean(diff))/fs
long_period
###DOING A SCATTER PLOT OF THE PEAK POINTS OVERLAPPING ON THE PREVIOUS PLOT OF AUTOCORR VS LAGS
f = plt.figure()
f.set_figwidth(40)
f.set_figheight(10)
plt.plot(lags/fs, autocorr)
shrt = [i/fs for i in peak_values_lags_short]
lng = [i/fs for i in peak_values_lags_long]
plt.scatter(shrt, peak_values_short, marker='o')
plt.scatter(lng, peak_values_long, marker='*')
如您所见,与 MATLAB 示例相比,我的 Python 输出有两处错误:
- 得到的'long time period'值(和它们的指数值)不同
- “长时间段”峰值位置的自校正和滞后值不同(如上图所示):
我不明白为什么 find_peaks()
第一次(确定所有峰值时)工作正常,但在提供更多参数以查找最高峰时第二次未能给出正确的结果.
如何正确检测第二个周期的最高峰?
我在回答我自己的问题。
我意识到我在 Python 代码中犯的唯一错误是没有像在 Matlab 示例中那样规范化自动校正值。我只是在我的代码中添加了以下内容:
autocorr = (autocorr-min(autocorr))/(max(autocorr)-min(autocorr))
当我这样做时,我最终得到了想要的结果,与示例中的结果相同:
因此,总而言之,find_peaks() 实际上完成了预期的工作。