将曲线拟合到数据，获取分析形式，并检查曲线何时超过阈值

Question

每条曲线有 40 个点，我想平滑函数并估计曲线在 y 轴上越过阈值的时间。有没有我可以轻松应用它的拟合函数，我可以使用插值来绘制新函数，但我不知道如何请求 y = threshold 的 x 值。

不幸的是，曲线的形状并不完全相同，所以我不能使用 scipy.optimize.curve_fit。

谢谢！

更新：添加两条曲线：

曲线 1

[942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688]

曲线 2

[-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980]

x 值为

[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40]

Answer 1

为了拟合平滑曲线，可以拟合Legendre polynomials using numpy.polynomial.legendre.Legendre's fit方法。

# import packages we need later
import matplotlib.pyplot as plt
import numpy as np

拟合勒让德多项式

正在准备数据as numpy arrays:

curve1 = \
np.asarray([942.153,353.081,53.088,125.110,140.851,188.170,70.536,-122.473,-369.061,-407.945,88.734,484.334,267.762,65.831,74.010,-55.781,-260.024,-466.830,-524.511,-76.833,-36.779,-117.366,218.578,175.662,185.653,299.285,215.276,546.048,1210.132,3087.326,7052.849,13867.824,27156.939,51379.664,91908.266,148874.563,215825.031,290073.219,369567.781,437031.688])
curve2 = \
np.asarray([-39034.039,-34637.941,-24945.094,-16697.996,-9247.398,-2002.051,3409.047,3658.145,7542.242,11781.340,11227.688,10089.035,9155.883,8413.980,5289.578,3150.676,4590.023,6342.871,3294.719,580.567,-938.586,-3919.738,-5580.390,-3141.793,-2785.945,-2683.597,-4287.750,-4947.902,-7347.554,-8919.457,-6403.359,-6722.011,-8181.414,-6807.566,-7603.218,-6298.371,-6909.523,-5878.675,-5193.578,-7193.980])
xvals = \
np.asarray([1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40])

让我们拟合勒让德多项式，degree 是使用的最高阶多项式，the first few is here for example。

degree=10
legendrefit_curve1 = np.polynomial.legendre.Legendre.fit(xvals, curve1, deg=degree)
legendrefit_curve2 = np.polynomial.legendre.Legendre.fit(xvals, curve2, deg=degree)

使用 linspace method 在均匀分布的点处计算这些拟合曲线。 n 是我们想要的点对的数量。

n=100
fitted_vals_curve1 = legendrefit_curve1.linspace(n=n)
fitted_vals_curve2 = legendrefit_curve2.linspace(n=n)

让我们绘制结果，以及 threshold（使用 axvline）：

plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)

plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')

threshold=100000
plt.axhline(y=threshold)

曲线完美贴合。

什么时候超过门槛？

要检查每个系列中 threshold 的交叉位置，您可以这样做：

for x, y in zip(fitted_vals_curve1[0], fitted_vals_curve1[1]):
    if y > threshold:
        xcross_curve1 = x
        break

for x, y in zip(fitted_vals_curve2[0], fitted_vals_curve2[1]):
    if y > threshold:
        xcross_curve2 = x
        break

xcross_curve1 和 xcross_curve2 将保持 x 值，其中 curve1 和 curve2 如果确实穿过 threshold =31=];如果没有，它们将是未定义的。

让我们绘制它们以检查它是否有效 (link to axhline docs):

plt.scatter(xvals, curve1)
plt.scatter(xvals, curve2)

plt.plot(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve2[0],fitted_vals_curve2[1],c='k')

plt.axhline(y=threshold)

try: plt.axvline(x=xcross_curve1)
except NameError: print('curve1 is not passing the threshold',c='b')

try: plt.axvline(x=xcross_curve2)
except NameError: print('curve2 is not passing the threshold')

不出所料，我们得到了这个图：

（和文本输出：curve2 is not passing the threshold。）

如果您想提高 xcross_curve1 或 xcross_curve2 的准确性，您可以增加上面定义的 degree 和 n。

从勒让德到多项式

我们拟合了一条曲线，大致有以下形式：

其中 P_n 是第 nth Legendre 多项式，s(x) 是一些将 x 转换为 P_n 期望范围的函数（一些我们不知道的数学东西现在需要知道）。

我们希望拟合线的形式为：

我们将使用 legendre() of scipy.special:

from scipy.special import legendre

我们还将使用 np.pad (docs, ).

legendredict={}
for icoef, coef in enumerate(legendrefit_curve1.coef):
    legendredict[icoef]=coef*np.pad(legendre(icoef).coef,(10-icoef,0),mode='constant')

legendredict 将保持 keys 从 0 到 10，并且 dict 中的每个值将是 float 的列表秒。 key 指的是多项式的阶数，float 的列表表示我们拟合的构成多项式中 x**n 值的系数，以倒序排列.

例如：

P_4 是：

legendredict[4] 是：

isarray([ 0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  3.29634565e+05,  3.65967884e-11,
       -2.82543913e+05,  1.82983942e-11,  2.82543913e+04])

意味着在 P_n 的总和中（f(x)，上面），我们有 q_4 很多 P_4，相当于 1 的 2.82543913e+04，x 的 1.82983942e-11，x^2 的 -2.82543913e+05，等等，仅来自 P_4 组件。

所以如果我们想知道有多少1s, xs, x^2s等我们需要形成多项式和，我们需要添加对1s、xs、x^2s 等来自所有不同的 P_ns。这就是我们所做的：

polycoeffs = np.sum(np.stack(list(legendredict.values())),axis=0)

然后让我们形成一个多项式和：

for icoef, coef in enumerate(reversed(polycoeffs)):
    print(str(coef)+'*s(x)**'+str(icoef),end='\n +')

给出输出：

-874.1456709637822*s(x)**0
 +2893.7228005540596*s(x)**1
 +50415.38472217957*s(x)**2
 +-6979.322584205707*s(x)**3
 +-453363.49985790614*s(x)**4
 +-250464.7549807652*s(x)**5
 +1250129.5521521813*s(x)**6
 +1267709.5031024509*s(x)**7
 +-493280.0177807359*s(x)**8
 +-795684.224334346*s(x)**9
 +-134370.1696946264*s(x)**10
 +

（我们将忽略最后的+符号，格式不是这里的重点。）

我们还需要计算s(x)。如果我们在 Jupyter Notebook / Google Colab 中工作，只执行一个带有 legendrefit_curve1 returns:

的单元格

从这里我们可以清楚地看出s(x)是-1.0512820512820513+0.05128205128205128x。如果我们想以更编程的方式进行：

2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) 是 0.05128205128205128 & -1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0]) 只是 -1.0512820512820513

出于某些数学原因，这是正确的，但与此无关 ()。

所以我们可以定义：

def s(input):
    a=-1-2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
    b=2/(legendrefit_curve1.domain[1]-legendrefit_curve1.domain[0])
    return a+b*input

另外，让我们根据上面得到的s(x)的多项式之和来定义：

def polyval(x):
    return -874.1456709637822*s(x)**0+2893.7228005540596*s(x)**1+50415.38472217957*s(x)**2+-6979.322584205707*s(x)**3+-453363.49985790614*s(x)**4+-250464.7549807652*s(x)**5+1250129.5521521813*s(x)**6+1267709.5031024509*s(x)**7+-493280.0177807359*s(x)**8+-795684.224334346*s(x)**9+-134370.1696946264*s(x)**10

以更程序化的方式：

def polyval(x):
    return sum([coef*s(x)**icoef for icoef, coef in enumerate(reversed(polycoeffs))])

检查我们的多项式是否确实适合：

plt.scatter(fitted_vals_curve1[0],fitted_vals_curve1[1],c='r')
plt.plot(fitted_vals_curve1[0],[polyval(val) for val in fitted_vals_curve1[0]])

确实如此：

所以让我们打印出我们的纯多项式和，其中 s(x) 被显式函数替换：

for icoef, coef in enumerate(reversed(polycoeffs)):
    print(str(coef)+'*(-1.0512820512820513+0512820512820513*x)**'+str(icoef),end='\n +')

给出输出：

-874.1456709637822*(-1.0512820512820513+0512820512820513*x)**0
 +2893.7228005540596*(-1.0512820512820513+0512820512820513*x)**1
 +50415.38472217957*(-1.0512820512820513+0512820512820513*x)**2
 +-6979.322584205707*(-1.0512820512820513+0512820512820513*x)**3
 +-453363.49985790614*(-1.0512820512820513+0512820512820513*x)**4
 +-250464.7549807652*(-1.0512820512820513+0512820512820513*x)**5
 +1250129.5521521813*(-1.0512820512820513+0512820512820513*x)**6
 +1267709.5031024509*(-1.0512820512820513+0512820512820513*x)**7
 +-493280.0177807359*(-1.0512820512820513+0512820512820513*x)**8
 +-795684.224334346*(-1.0512820512820513+0512820512820513*x)**9
 +-134370.1696946264*(-1.0512820512820513+0512820512820513*x)**10
 +

可以根据需要进行简化。 （忽略最后一个 + 符号。）

如果想要高（低）次多项式拟合，就拟合高（低）次勒让德多项式。

将曲线拟合到数据，获取分析形式，并检查曲线何时超过阈值

Fit curve to data, get analytical form, & check when curve crosses threshold

python

spline

curve-fitting

拟合勒让德多项式

什么时候超过门槛？

从勒让德到多项式