将曲线拟合到数据,获取分析形式,并检查曲线何时超过阈值

Fit curve to data, get analytical form, & check when curve crosses threshold

每条曲线有 40 个点,我想平滑函数并估计曲线在 y 轴上越过阈值的时间。 有没有我可以轻松应用它的拟合函数,我可以使用插值来绘制新函数,但我不知道如何请求 y = threshold 的 x 值。

不幸的是,曲线的形状并不完全相同,所以我不能使用 scipy.optimize.curve_fit。

谢谢!

更新:添加两条曲线:

曲线 1

[942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688]

曲线 2

[-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980]

x 值为

[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]

为了拟合平滑曲线,可以拟合Legendre polynomials using numpy.polynomial.legendre.Legendre's fit方法。


# import packages we need later
import matplotlib.pyplot as plt
import numpy as np

拟合勒让德多项式

正在准备数据as numpy arrays:

curve1 = \
np.asarray([942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688])
curve2 = \
np.asarray([-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980])
xvals = \
np.asarray([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40])

让我们拟合勒让德多项式,degree 是使用的最高阶多项式,the first few is here for example

degree=10
legendrefit_curve1 = np.polynomial.legendre.Legendre.fit(xvals, curve1, deg=degree)
legendrefit_curve2 = np.polynomial.legendre.Legendre.fit(xvals, curve2, deg=degree)

使用 linspace method 在均匀分布的点处计算这些拟合曲线。 n 是我们想要的点对的数量。

n=100
fitted_vals_curve1 = legendrefit_curve1.linspace(n=n)
fitted_vals_curve2 = legendrefit_curve2.linspace(n=n)

让我们绘制结果,以及 threshold(使用 axvline):

plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)

plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')

threshold=100000
plt.axhline(y=threshold)

曲线完美贴合。


什么时候超过门槛?

要检查每个系列中 threshold 的交叉位置,您可以这样做:

for x, y in zip(fitted_vals_curve1[0], fitted_vals_curve1[1]):
    if y > threshold:
        xcross_curve1 = x
        break

for x, y in zip(fitted_vals_curve2[0], fitted_vals_curve2[1]):
    if y > threshold:
        xcross_curve2 = x
        break

xcross_curve1xcross_curve2 将保持 x 值,其中 curve1curve2 如果确实穿过 threshold =31=];如果没有,它们将是未定义的。

让我们绘制它们以检查它是否有效 (link to axhline docs):

plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)

plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')

plt.axhline(y=threshold)

try: plt.axvline(x=xcross_curve1)
except NameError: print('curve1 is not passing the threshold',c='b')

try: plt.axvline(x=xcross_curve2)
except NameError: print('curve2 is not passing the threshold')

不出所料,我们得到了这个图:

(和文本输出:curve2 is not passing the threshold。)

如果您想提高 xcross_curve1xcross_curve2 的准确性,您可以增加上面定义的 degreen


从勒让德到多项式

我们拟合了一条曲线,大致有以下形式:

其中 P_n 是第 nth Legendre 多项式,s(x) 是一些将 x 转换为 P_n 期望范围的函数(一些我们不知道的数学东西现在需要知道)。

我们希望拟合线的形式为:

我们将使用 legendre() of scipy.special:

from scipy.special import legendre

我们还将使用 np.pad (docs, ).

legendredict={}
for icoef, coef in enumerate(legendrefit_curve1.coef):
    legendredict[icoef]=coef*np.pad(legendre(icoef).coef,(10-icoef,0),mode='constant')

legendredict 将保持 keys010,并且 dict 中的每个值将是 float 的列表秒。 key 指的是多项式的阶数,float 的列表表示我们拟合的构成多项式中 x**n 值的系数,以倒序排列.

例如:

P_4 是:

legendredict[4] 是:

isarray([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  3.29634565e+05,  3.65967884e-11,
       -2.82543913e+05,  1.82983942e-11,  2.82543913e+04])

意味着在 P_n 总和 中(f(x),上面),我们有 q_4 很多 P_4,相当于 12.82543913e+04x1.82983942e-11x^2-2.82543913e+05,等等,仅来自 P_4 组件

所以如果我们想知道有多少1s, xs, x^2s等我们需要形成多项式和,我们需要添加对1s、xs、x^2s 等来自所有不同的 P_ns。这就是我们所做的:

polycoeffs = np.sum(np.stack(list(legendredict.values())),axis=0)

然后让我们形成一个多项式和:

for icoef, coef in enumerate(reversed(polycoeffs)):
    print(str(coef)+'*s(x)**'+str(icoef),end='\n +')

给出输出:

-874.1456709637822*s(x)**0
 +2893.7228005540596*s(x)**1
 +50415.38472217957*s(x)**2
 +-6979.322584205707*s(x)**3
 +-453363.49985790614*s(x)**4
 +-250464.7549807652*s(x)**5
 +1250129.5521521813*s(x)**6
 +1267709.5031024509*s(x)**7
 +-493280.0177807359*s(x)**8
 +-795684.224334346*s(x)**9
 +-134370.1696946264*s(x)**10
 +

(我们将忽略最后的+符号,格式不是这里的重点。)

我们还需要计算s(x)。如果我们在 Jupyter Notebook / Google Colab 中工作,只执行一个带有 legendrefit_curve1 returns:

的单元格

从这里我们可以清楚地看出s(x)-1.0512820512820513+0.05128205128205128x。如果我们想以更编程的方式进行:

2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])0.05128205128205128 & -1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) 只是 -1.0512820512820513

出于某些数学原因,这是正确的,但与此无关 ()。

所以我们可以定义:

def s(input):
    a=-1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
    b=2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
    return a+b*input

另外,让我们根据上面得到的s(x)的多项式之和来定义:

def polyval(x):
    return -874.1456709637822*s(x)**0+2893.7228005540596*s(x)**1+50415.38472217957*s(x)**2+-6979.322584205707*s(x)**3+-453363.49985790614*s(x)**4+-250464.7549807652*s(x)**5+1250129.5521521813*s(x)**6+1267709.5031024509*s(x)**7+-493280.0177807359*s(x)**8+-795684.224334346*s(x)**9+-134370.1696946264*s(x)**10

以更程序化的方式:

def polyval(x):
    return sum([coef*s(x)**icoef for icoef, coef in enumerate(reversed(polycoeffs))])

检查我们的多项式是否确实适合:

plt.scatter(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve1[0],[polyval(val) for val in fitted_vals_curve1[0]])

确实如此:

所以让我们打印出我们的纯多项式和,其中 s(x) 被显式函数替换:

for icoef, coef in enumerate(reversed(polycoeffs)):
    print(str(coef)+'*(-1.0512820512820513+0512820512820513*x)**'+str(icoef),end='\n +')

给出输出:

-874.1456709637822*(-1.0512820512820513+0512820512820513*x)**0
 +2893.7228005540596*(-1.0512820512820513+0512820512820513*x)**1
 +50415.38472217957*(-1.0512820512820513+0512820512820513*x)**2
 +-6979.322584205707*(-1.0512820512820513+0512820512820513*x)**3
 +-453363.49985790614*(-1.0512820512820513+0512820512820513*x)**4
 +-250464.7549807652*(-1.0512820512820513+0512820512820513*x)**5
 +1250129.5521521813*(-1.0512820512820513+0512820512820513*x)**6
 +1267709.5031024509*(-1.0512820512820513+0512820512820513*x)**7
 +-493280.0177807359*(-1.0512820512820513+0512820512820513*x)**8
 +-795684.224334346*(-1.0512820512820513+0512820512820513*x)**9
 +-134370.1696946264*(-1.0512820512820513+0512820512820513*x)**10
 +

可以根据需要进行简化。 (忽略最后一个 + 符号。)

如果想要高(低)次多项式拟合,就拟合高(低)次勒让德多项式。