如何最小化 3 次线性拟合的卡方

Question

from numpy import *
import matplotlib.pyplot as plt
import numpy as np

# This is my data set
x = [15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240]
y = [1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.29, 0.27, 0.25, 0.23]

我想向这个数据集添加 3 个线性回归。通过使用 pyplot 绘制我的数据集，我可以直观地看到扭结开始形成的位置（大约在 x = 105 和 x = 165 处）。因此我可以创建 3 个线性回归（从 x 为 0 到 105、105 到 165 以及 165 到 240）。但我将如何科学地做到这一点？换句话说，我想向我的数据添加 3 个线性回归，以最小化卡方。有没有办法用代码来完成这个？

Answer 1

您可以在下面找到使用 scipy.stats.linregress 的自动化过程的代码和输出；解释可以在代码下面找到。输出如下所示：

斜率和截距项是：

曲线 1：-0.0066 * x + 1.10
曲线 2：-0.0033 * x + 0.85
曲线 3：-0.0013 * x + 0.55

代码如下：

from scipy import stats
import matplotlib.pyplot as plt
import numpy as np

x = np.array([15, 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240])
y = np.array([1, 0.9, 0.8, 0.7, 0.6, 0.55, 0.5, 0.45, 0.4, 0.35, 0.33, 0.31, 0.29, 0.27, 0.25, 0.23])

# get slope of your data
dif = np.diff(y) / np.diff(x)

# determine the change of the slope
difdif = np.diff(dif)

# define a threshold for the allowed change of the slope
threshold = 0.001

# get indices where the diff returns value larger than a threshold
indNZ = np.where(abs(difdif) > threshold)[0]

# this makes plotting easier and avoids a couple of if clauses
indNZ += 1
indNZ = np.append(indNZ, len(x))
indNZ = np.insert(indNZ, 0, 0)

# plot the data
plt.scatter(x, y)

for indi, ind in enumerate(indNZ):

    if ind < len(x):
        slope, intercept, r_value, p_value, std_err = stats.linregress(x[ind:indNZ[indi+1]], y[ind:indNZ[indi+1]])
        plt.plot(x[ind:indNZ[indi+1]], slope * x[ind:indNZ[indi+1]] + intercept)

plt.show()

首先，可以使用np.diff计算斜率。将 np.diff 应用于斜率可以得到斜率显着变化的点；在上面的代码中，我为此使用了一个阈值（如果你总是处理完美的线条，那么这个值可以设置为一个非常小的值；如果你有嘈杂的数据，你将不得不调整这个值）。

有了斜率显着变化的指数，然后可以在各个部分进行线性回归并相应地绘制结果。

更详细的 for 循环：

indNZ

是

array([ 0,  4,  9, 16])

这给了你三行的间隔。所以蓝线对应x[0]和x[3]的部分，绿线对应x[4]到x[8]的部分，红线对应[=19的部分=] 到 x[15]。在 for 循环中，选择这些范围，使用 scipy.stats.linregress 完成线性拟合（如果您更喜欢，也可以用 polyfit 代替），然后使用等式 slope * x + intercept.

如何最小化 3 次线性拟合的卡方

How to minimize chi squared for 3 linear fits

python

numpy

curve-fitting

linear-regression

chi-squared