当考虑多个数据集时,如何仅获得一个多项式拟合的参数集?
How to obtain only one parameter set for a polynomial fit when several data sets are taken into account?
这是 的 follow-up。我有几个共享相同 x-coordinates 的样本点数据集,现在想要考虑所有这些样本点来进行多项式拟合。这意味着我希望最终得到一组最能描述数据的参数。
我想出了如何将多个数据集(在我下面的示例中只有 2 个)传递给拟合函数,但是,我随后为每个数据集获取一个参数集。
如何只获取一组最能描述我所有数据集的参数?
这是我的代码和我得到的输出:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
[0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])
y = y.T
# plt.plot(x,y[:, 0], 'ro', x,y[:,1],'bo')
# plt.show()
x_plot = np.linspace(0, max(x), 100)
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]
plt.scatter(x, y[:, 0], label="training points 1", c='r')
plt.scatter(x, y[:, 1], label="training points 2", c='b')
for degree in np.arange(4, 5, 1):
model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
model.fit(X, y)
y_plot = model.predict(X_plot)
plt.plot(x_plot, y_plot, label="degree %d" % degree)
plt.legend(loc='lower left')
plt.show()
ridge = model.named_steps['ridge']
print(ridge.coef_)
如您所见,我为每个数据集得到一条曲线:
以及两个参数集:
[[ -4.09943033e-01 -1.86960613e+00 1.73923722e+00 -1.01704665e-01
1.73567123e-03]
[ 4.19862603e-01 2.18343362e+00 8.37222298e-01 -4.18711046e-02
5.69089912e-04]]
PS.: 如果我使用的工具不是最适合的工具,我也很乐意得到我应该使用的建议。
您需要将数据整合到一个数据集中。例如:
x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)
这是一个完整的例子:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
[0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])
x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)
plt.scatter(x, y[0], label="training points 1", c='r')
plt.scatter(x, y[1], label="training points 2", c='b')
x_plot = np.linspace(0, max(x), 100)
for degree in np.arange(4, 5, 1):
model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
model.fit(x_all[:, None], y_all)
y_plot = model.predict(x_plot[:, None])
plt.plot(x_plot, y_plot, label="degree %d" % degree)
ridge = model.named_steps['ridge']
print(degree, ridge.coef_)
plt.legend(loc='best')
输出是
4 [ 1.72754641e-03 1.36364501e-01 1.29300064e+00 -7.20932655e-02 1.15823050e-03]
这是
我想出了如何将多个数据集(在我下面的示例中只有 2 个)传递给拟合函数,但是,我随后为每个数据集获取一个参数集。
如何只获取一组最能描述我所有数据集的参数?
这是我的代码和我得到的输出:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
[0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])
y = y.T
# plt.plot(x,y[:, 0], 'ro', x,y[:,1],'bo')
# plt.show()
x_plot = np.linspace(0, max(x), 100)
X = x[:, np.newaxis]
X_plot = x_plot[:, np.newaxis]
plt.scatter(x, y[:, 0], label="training points 1", c='r')
plt.scatter(x, y[:, 1], label="training points 2", c='b')
for degree in np.arange(4, 5, 1):
model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
model.fit(X, y)
y_plot = model.predict(X_plot)
plt.plot(x_plot, y_plot, label="degree %d" % degree)
plt.legend(loc='lower left')
plt.show()
ridge = model.named_steps['ridge']
print(ridge.coef_)
如您所见,我为每个数据集得到一条曲线:
以及两个参数集:
[[ -4.09943033e-01 -1.86960613e+00 1.73923722e+00 -1.01704665e-01
1.73567123e-03]
[ 4.19862603e-01 2.18343362e+00 8.37222298e-01 -4.18711046e-02
5.69089912e-04]]
PS.: 如果我使用的工具不是最适合的工具,我也很乐意得到我应该使用的建议。
您需要将数据整合到一个数据集中。例如:
x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)
这是一个完整的例子:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import Ridge
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import make_pipeline
x = np.array([0., 4., 9., 12., 16., 20., 24., 27.])
y = np.array([[2.9, 4.3, 66.7, 91.4, 109.2, 114.8, 135.5, 134.2],
[0.9, 17.3, 69.7, 81.4, 119.2, 124.8, 155.5, 144.2]])
x_all = np.ravel(x + np.zeros_like(y))
y_all = np.ravel(y)
plt.scatter(x, y[0], label="training points 1", c='r')
plt.scatter(x, y[1], label="training points 2", c='b')
x_plot = np.linspace(0, max(x), 100)
for degree in np.arange(4, 5, 1):
model = make_pipeline(PolynomialFeatures(degree), Ridge(alpha=3, fit_intercept=False))
model.fit(x_all[:, None], y_all)
y_plot = model.predict(x_plot[:, None])
plt.plot(x_plot, y_plot, label="degree %d" % degree)
ridge = model.named_steps['ridge']
print(degree, ridge.coef_)
plt.legend(loc='best')
输出是
4 [ 1.72754641e-03 1.36364501e-01 1.29300064e+00 -7.20932655e-02 1.15823050e-03]