当考虑多个数据集时，如何仅获得一个多项式拟合的参数集？

Question

这是的 follow-up。我有几个共享相同 x-coordinates 的样本点数据集，现在想要考虑所有这些样本点来进行多项式拟合。这意味着我希望最终得到一组最能描述数据的参数。

我想出了如何将多个数据集（在我下面的示例中只有 2 个）传递给拟合函数，但是，我随后为每个数据集获取一个参数集。

如何只获取一组最能描述我所有数据集的参数？

这是我的代码和我得到的输出：

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline


x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
              [0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])
y = y.T
# plt.plot(x,y[:, 0], 'ro', x,y[:,1],'bo')
# plt.show()

x_plot = np.linspace(0, max(x), 100)
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]

plt.scatter(x, y[:, 0], label="training points 1", c='r')
plt.scatter(x, y[:, 1], label="training points 2", c='b')

for degree in np.arange(4, 5, 1):
    model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
    model.fit(X, y)
    y_plot = model.predict(X_plot)
    plt.plot(x_plot, y_plot, label="degree %d" % degree)

plt.legend(loc='lower left')

plt.show()

ridge = model.named_steps['ridge']
print(ridge.coef_)

如您所见，我为每个数据集得到一条曲线：

以及两个参数集：

[[ -4.09943033e-01  -1.86960613e+00   1.73923722e+00  -1.01704665e-01
    1.73567123e-03]
 [  4.19862603e-01   2.18343362e+00   8.37222298e-01  -4.18711046e-02
    5.69089912e-04]]

PS.: 如果我使用的工具不是最适合的工具，我也很乐意得到我应该使用的建议。

Answer 1

您需要将数据整合到一个数据集中。例如：

x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)

这是一个完整的例子：

import numpy as np
import matplotlib.pyplot as plt

from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline

x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
              [0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])

x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)

plt.scatter(x, y[0], label="training points 1", c='r')
plt.scatter(x, y[1], label="training points 2", c='b')

x_plot = np.linspace(0, max(x), 100)

for degree in np.arange(4, 5, 1):
    model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
    model.fit(x_all[:, None], y_all)
    y_plot = model.predict(x_plot[:, None])
    plt.plot(x_plot, y_plot, label="degree %d" % degree)

    ridge = model.named_steps['ridge']
    print(degree, ridge.coef_)

plt.legend(loc='best')

输出是

4 [  1.72754641e-03   1.36364501e-01   1.29300064e+00  -7.20932655e-02 1.15823050e-03]

当考虑多个数据集时，如何仅获得一个多项式拟合的参数集？

How to obtain only one parameter set for a polynomial fit when several data sets are taken into account?

python

regression

curve-fitting

scikit-learn