带加权点的线性回归 python

Question

ax.scatter(x = NewUFOTimes['one'], y = NewUFOTimes['three'], s = (NewUFOTimes['two']/10))

[1]: https://i.stack.imgur.com/oNb9r.png

如何对较大点的权重进行线性回归？

Answer 1

对于 Python 中的此类工作，numpy 模块是 must-have。

我推断您正在使用 pandas - 如果是这样，您已经将此库作为依赖项安装，因此我认为我们可以假设以下内容会起作用：

x = NewUFOTimes["one"].values
y = NewUFOTimes["three"].values
w = NewUFOTimes["two"].values

这应该将基础 numpy 数组存储在 x、y 和 w 中。我还将假设 ax 是一个 matplotlib Axes 对象。

接下来我们将使用 numpy 拟合多项式的能力，在本例中为一次多项式（线性拟合）。开箱即用，此合身支持可选配重。 You can read the full documentation here.

这是它的样子：

from numpy.polynomial import Polynomial

... #  code that produces your original plot

x = NewUFOTimes["one"].values
y = NewUFOTimes["three"].values
w = NewUFOTimes["two"].values

line = Polynomial.fit(x, y, 1, w=w)

fit_x, fit_y = line.linspace()

ax.plot(fit_x, fit_y, '-', label="Weighted fit")

现在，我不知道你的权重是从哪里来的，所以我不知道他们是否会遵守 numpy 文档的建议

Ideally the weights are chosen so that the errors of the products w[i]*y[i] all have the same variance.

也许您正在使用 inverse-variance 权重。文档指出：

When using inverse-variance weighting, use w[i] = 1/sigma(y[i]).

仔细考虑如何在拟合中引入权重很重要。线性回归通常假定同方差性（即 'all weights equal'），因此您应该确保引入的权重是 well-motivated 和 well-executed.

顺便提一句，'old' 的方法是使用 numpy.polyfit。但是，该函数的文档字符串现在包含条件

The Polynomial.fit <numpy.polynomial.polynomial.Polynomial.fit> class method is recommended for new code as it is more stable numerically. See the documentation of the method for more information.

值得注意的是还有一个trade-off。如果接收与拟合关联的协方差矩阵对您很重要（cov=True 选项与 polyfit），您可能仍希望使用旧方法。但是，如果这不重要，那么上述方法可能是最好的。我应该添加：the use of weightings in the fit likely prevents the covariance matrix from being useful anyway!

带加权点的线性回归 python

Linear regression with weighted points python

python

scatter-plot