如何使用 scikit-learn 将多项式曲线拟合到数据？

Question

问题背景

将 scikit-learn 与 Python 结合使用，我试图将二次多项式曲线拟合到一组数据，以便模型的形式为 y = a2x^2 + a1x + a0 和an 系数将由模型提供。

问题

我不知道如何使用该程序包来拟合多项式曲线，而且关于如何进行拟合的清晰参考资料似乎少得惊人（我已经查找了一段时间）。我看过 this question on doing something similar with NumPy, and also this question which does a more complicated fit than I require.

一个好的解决方案是什么样的

希望一个好的解决方案会像这样解决（样本改编自我正在使用的线性拟合代码）：

x = my_x_data.reshape(len(profile), 1)
y = my_y_data.reshape(len(profile), 1)
regression = linear_model.LinearRegression(degree=2) # or PolynomialRegression(degree=2) or QuadraticRegression()
regression.fit(x, y)

我想 scikit-learn 会有这样的设施，因为它很常见（例如，在 R 中，可以在代码中提供拟合公式，它们应该是对于那种用例可以完全互换。

问题：

执行此操作的好方法是什么，或者我在哪里可以找到有关如何正确执行此操作的信息？

Answer 1

可能重复：https://stats.stackexchange.com/questions/58739/polynomial-regression-using-scikit-learn。

出于某种原因，使用 scikit-learn 完成此操作是否至关重要？你想要的操作可以使用 numpy 很容易地执行：

z = np.poly1d(np.polyfit(x,y,2))

之后 z(x) returns 处的拟合值 x.

scikit-learn 解决方案几乎可以肯定只是对相同代码的包装。

Answer 2

我相信达利的回答here will answer your question. In scikit-learn, it will suffice to construct the polynomial features from your data, and then run linear regression on that expanded dataset. If you're interested in reading some documentation about it, you can find more information here。为了方便起见，我将 post Salvador Dali 提供的示例代码：

from sklearn.preprocessing import PolynomialFeatures
from sklearn import linear_model

X = [[0.44, 0.68], [0.99, 0.23]]
vector = [109.85, 155.72]
predict= [0.49, 0.18]

poly = PolynomialFeatures(degree=2)
X_ = poly.fit_transform(X)
predict_ = poly.fit_transform(predict)

clf = linear_model.LinearRegression()
clf.fit(X_, vector)
print clf.predict(predict_)

Answer 3

AGML 的答案可以像这样包装在与 scikit-learn 兼容的 class 中：

class PolyEstimator:
    def __init__(self, degree=2):
        self.degree = degree

    def fit(self, x, y):
        self.z = np.poly1d(np.polyfit(x.flatten().tolist(), y, self.degree))

    def predict(self, x):
        return self.z(x.flatten().tolist())

如何使用 scikit-learn 将多项式曲线拟合到数据？

How to fit a polynomial curve to data using scikit-learn?

python

regression

numpy

machine-learning

scikit-learn

问题背景

问题

一个好的解决方案是什么样的

问题：