如何提取图中所有包含峰值的曲线的斜率?

How to extract slope of all the peak containing curves in a graph?

我有一个数据集,我从中生成了图表。我能够使用 scipy 从这些图中提取高于阈值的峰值。我正在尝试创建一个数据框,其中包含峰值特征,如峰值、峰宽、峰高、包含峰值的曲线斜率、包含峰值的曲线中的点数等。我正在努力寻找一种方法提取曲线中包含峰值的斜率和点数。

c_dict["L-04"][3][0] data 存在于粘贴箱中 link。

这是我尝试提取一些峰值特征的代码。

def extract_peak_features(c_dict,households):
    peak_list=[]
    width_list=[]
    half_width_list=[]
    smoke_list=[]
    house_list=[]
    for key,value in c_dict.items():
        if not key.startswith("L-01") and not key.startswith("H"):
            for k,v in value.items():
                if k==3:
                    if len(v) > 0:
                        if key in households:
                            smoking = 1
                        else:
                            smoking = 0
                        peaks, _ = find_peaks(v[0],prominence=50)
                        half_widths = peak_widths(v[0], peaks, rel_height=0.5)[0]
                        widths = peak_widths(v[0], peaks, rel_height=1)[0]
                        if len(peaks) > 0:
                            peak_list.extend(np.array(v[0])[peaks])
                            width_list.extend(widths)
                            half_width_list.extend(half_widths)
                            smoke_list.extend([smoking] * len(peaks))
                            house_list.extend([key] * len(peaks))
                        print(key,len(peaks),len(widths),len(half_widths))

    data = {"ID":house_list,"peaks":peak_list,"width":width_list,"half_width":half_width_list,"smoke":smoke_list}
    df_peak_stats = pd.DataFrame(data=data)
    return df_peak_stats
df_peak_stats = extract_peak_features(c_dict,households)

使用 scipymatplotlib 绘制 c_dict["L-04"][3][0] 数据的代码。

peaks, _ = find_peaks(c_dict["L-04"][3][0],prominence=50)
results_half = peak_widths(c_dict["L-04"][3][0], peaks, rel_height=0.5)
results_half[0]  # widths
results_full = peak_widths(c_dict["L-04"][3][0], peaks, rel_height=1)
plt.plot(c_dict["L-04"][3][0])
plt.plot(peaks, np.array(c_dict["L-04"][3][0])[peaks], "x")
#plt.hlines(*results_half[1:], color="C2")
plt.hlines(*results_full[1:], color="C3")
plt.show()

综上所述,我想知道如何提取上面 4 条曲线中包含峰值的斜率和点数。

因为你的数据中的峰是局部的,我为四个峰中的每一个创建了 4 个子图。

from scipy.signal import find_peaks,peak_widths

test = np.array(test)
test_inds = np.arange(len(test))
peaks, _ = find_peaks(test,prominence=50)
prominences, left_bases, right_bases = peak_prominences(test,peaks)

offset = np.ones_like(prominences)
# Calculate widths at x[peaks] - offset * rel_height
widths, h_eval, left_ips, right_ips = peak_widths(
    test, peaks, 
    rel_height=1,
    prominence_data=(offset, left_bases, right_bases)
)

其中test是你post中的数组。上面的代码基本上定位了数组中的峰,以便找到你想要的两个关联点:

  1. 峰值左侧向上曲线开始的点
  2. 峰值右侧的点及其值接近左侧的点

基于,您可以使用kneed

fig,ax = plt.subplots(nrows=2,ncols=2,figsize=(18,10))

for ind,item in enumerate(zip(left_ips,right_ips)):
    
    left_ip,right_ip = item
    row_idx,col_idx = ind // 2,ind % 2
    
    # This is where the peak locates 
    pc = np.array([int(left_ip)+1,test[int(left_ip)+1]])

    # find the point where the curve starts to increase
    # based on what your data look like, such a critical point can be found within the range 
    # test_inds[int(pc[0])-200: int(pc[0])], note that test_inds is an array of the inds of the points in your data
    kn_l = KneeLocator(test_inds[int(pc[0])-200:int(pc[0])],test[int(pc[0])-200:int(pc[0])],curve='convex',direction='increasing')
    kn_l = kn_l.knee
    pl = np.array([kn_l,test[kn_l]])
    # find the point to the right of the peak, the point is almost on the same level as the point on the left 
    # in this example, the threshold is set to 1
    mask_zero = np.abs(test - pl[1]*np.ones(len(test))) < 1
    mask_greater = test_inds > pc[0]
    pr_idx = np.argmax(np.logical_and(mask_zero,mask_greater))
    pr = np.array([pr_idx,test[pr_idx]])
    
    ax[row_idx][col_idx].set_xlim(int(pl[0])-20,int(pr[0])+20)
    ax[row_idx][col_idx].scatter(int(pl[0]),test[int(pl[0])],s=100,color='aquamarine',zorder=500)
    ax[row_idx][col_idx].scatter(int(pr[0]),test[int(pr[0])],s=100,color='aquamarine',zorder=500)
    
    get_angle = lambda v1, v2:\
        np.rad2deg(np.arccos(np.clip(np.dot(v1, v2) / np.linalg.norm(v1) / np.linalg.norm(v2),-1,1)))
    angle_l = get_angle(pr-pl,pc-pl)
    angle_r = get_angle(pl-pr,pc-pr)
    
    ax[row_idx][col_idx].annotate('%.2f deg' % angle_l,xy=pl+np.array([5,20]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    ax[row_idx][col_idx].annotate('%.2f deg' % angle_r,xy=pr+np.array([-1,20]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    ax[row_idx][col_idx].plot([pl[0],pc[0]],[pl[1],pc[1]],'-',lw=2,color='navy')
    ax[row_idx][col_idx].plot([pc[0],pr[0]],[pc[1],pr[1]],'-',lw=2,color='navy')
    
    ax[row_idx][col_idx].hlines(pl[1],pl[0],pc[0],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].hlines(pr[1],pc[0],pr[0],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].vlines(pc[0],pl[1],pc[1],linestyle='--',lw=.8,color='k')
    ax[row_idx][col_idx].vlines(pc[0],pr[1],pc[1],linestyle='--',lw=.8,color='k')
    
    rto_1 = (pc[1]-pl[1])/(pc[0]-pl[0])
    rto_2 = (pc[1]-pr[1])/(pc[0]-pr[0])
    ax[row_idx][col_idx].annotate('ratio1=%.3f' % rto_1,xy=pr+np.array([15,100]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    
    ax[row_idx][col_idx].annotate('ratio2=%.3f' % rto_2,xy=pr+np.array([15,60]),xycoords='data',
                                  fontsize=15,horizontalalignment='right',verticalalignment='bottom',zorder=600)
    
    pl_idx,pc_idx,pr_idx = pl[0].astype(np.int),pc[0].astype(np.int),pr[0].astype(np.int)
    ax[row_idx][col_idx].plot(range(int(pl[0])-20,pl_idx+1),test[int(pl[0])-20:pl_idx+1],'ko-',lw=1,markersize=1.5)
    ax[row_idx][col_idx].plot(range(pl_idx,pr_idx+1),test[pl_idx:pr_idx+1],'ro-',lw=1,zorder=200,markersize=1.5)
    ax[row_idx][col_idx].plot(range(pr_idx,int(pr[0])+20),test[pr_idx:int(pr[0])+20],'ko-',lw=1,markersize=1.5)
    ax[row_idx][col_idx].scatter(peaks[ind],test[peaks[ind]],marker='x',s=30,c='red',zorder=100)