加权回归sklearn

Question

我想根据新近度为我的训练数据增加权重。

如果我们看一个简单的例子：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, normalize
from sklearn.linear_model import LinearRegression

X = np.array([1,2,3,4,5,6,7,8,9,10]).reshape(-1,1)
Y = np.array([0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 10]).reshape(-1,1)

poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X)
pol_reg = LinearRegression()
pol_reg.fit(X_poly, Y)

plt.scatter(X, Y, color='red')
plt.plot(X, pol_reg.predict(poly_reg.fit_transform(X)), color='blue')

现在假设 X 值为 time-based，Y 值为传感器的快照。所以我们正在对一些行为进行建模。我相信最新的数据点是最重要的，因为它们是最新的，也是最能预示未来行为的。我想调整我的模型，使最新数据点的权重最高。

关于在 R 中执行此操作有一个问题： https://stats.stackexchange.com/questions/196653/assigning-more-weight-to-more-recent-observations-in-regression

我想知道 sklearn 软件包（或任何其他 python 软件包）是否具有此功能？

这个加权模型会有类似的曲线，但会更好地拟合新点。如果我想用这个模型来预测未来，non-weighted 模型在预测时总是过于保守，因为它们对最新数据不那么敏感。

除了使用这种方法，我还使用 curve_fit 来使用幂函数或指数函数：

from scipy.optimize import curve_fit

def func(x, a, b):
    return a*(x**b)

X = [1,2,3,4,5,6,7,8,9,10]
Y = [0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 10]

popt, pcov = curve_fit(func, X, Y, bounds=([-np.inf,1], [np.inf, np.inf]))
plt.plot(X, func(X, *popt), color = 'green')

如果使用 func 和 curve_fit 的解决方案是可能的，我也愿意接受，或者任何其他方法。唯一需要注意的是，我的 real-world 数据并不总是暗示解是单调递增函数，但我的理想解是。

Answer 1

我查看了 sklearn 的 LinearRegression API here，我看到 class 有一个 fit() 方法，它具有以下签名：fit(self, X, y[, sample_weight]) 所以，据我所知，你实际上可以为你的样本赋予它一个权重向量。

Answer 2

从头开始实施：

import matplotlib.pyplot as plt
import numpy as np
from sklearn.preprocessing import PolynomialFeatures, normalize
from sklearn.linear_model import LinearRegression

#%matplotlib inline

X = np.array([1,2,3,4,5,6,7,8,9,10]).reshape(-1,1)
#Weights.sum() = 1 
w = np.exp(X)/sum(np.exp(X))

Y = np.array([0.25, 0.5, 0.75, 1, 1.5, 2, 3, 4, 6, 10]).reshape(-1,1)

poly_reg = PolynomialFeatures(degree=2)
#Vandermonde Matrix
X_poly = poly_reg.fit_transform(X)

#Solve Weighted Normal Equation
A = np.linalg.inv(X_poly.T @ (w*X_poly))
beta = (A @ X_poly.T) @ (w*Y)

#Define Ploynomial - Use Numpy for optimzation
def polynomial(x, coeff):
    y = 0
    for p, c in enumerate(coeff):
        y += c * x**p
    return y

plt.scatter(X, Y, color='red')
plt.plot(X, polynomial(X, beta), color='blue')

#Source https://en.wikipedia.org/wiki/Weighted_least_squares#Introduction

请注意，这与 Teo 的答案相同，而且他的答案更短。

加权回归sklearn

weighted regression sklearn

python

regression

weighted

scikit-learn