多项式度散点图点不适合线性回归
polynomial degree scatter graph points not fitting for linear regression
我正在使用 sklearn 线性和多项式特征来拟合数据集。代码如下所示。我正在使用散点图绘制点,但它们似乎与预测值不一致。不知道我错过了什么。我尝试将度数从 1 更改为 20 但没有效果。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
DEGREE = 5
X = np.array([276237,276617, 276997, 277377, 277757, 278137, 278517, 278897, 279277, 279657]).reshape(-1, 1)
y = np.array([6, 8, 2, 4, 0, 1, 7, 0, 1, 4])
poly_feat = PolynomialFeatures(degree=DEGREE)
X_poly = poly_feat.fit_transform(X)
lm = LinearRegression(fit_intercept = False)
lm.fit(X_poly, y)
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(X, lm.predict(X_poly), color='r')
ax.set_xlabel('Total Amount')
ax.set_ylabel('Days to mine')
ax.plot(X,y)
plt.show()
我猜是因为你没有足够的数据。您有 5 次多项式,但只有 10 个数据。该模型训练不好。我试着整理了一些数据,发现你的代码运行良好:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
BLOCK_REWARD = 380
DEGREE = 5
#X = np.array([276237,276617, 276997, 277377, 277757, 278137, 278517, 278897, 279277, 279657]).reshape(-1, 1)
#y = np.array([6, 8, 2, 4, 0, 1, 7, 0, 1, 4])
# New data
n = 50
X = np.linspace(-5, 5, n)
y = X**5 - 3 * X**4 + 2 * X**3 + 4 * X**2 - X + 6 + 200*np.random.randn(n)
X = X.reshape(-1, 1)
# Everything remain unchange
poly_feat = PolynomialFeatures(degree=DEGREE)
X_poly = poly_feat.fit_transform(X)
lm = LinearRegression(fit_intercept = False)
lm.fit(X_poly, y)
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(X, lm.predict(X_poly), color='r')
ax.set_xlabel('Total Amount')
ax.set_ylabel('Days to mine')
ax.plot(X,y)
plt.show()
我正在使用 sklearn 线性和多项式特征来拟合数据集。代码如下所示。我正在使用散点图绘制点,但它们似乎与预测值不一致。不知道我错过了什么。我尝试将度数从 1 更改为 20 但没有效果。
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
DEGREE = 5
X = np.array([276237,276617, 276997, 277377, 277757, 278137, 278517, 278897, 279277, 279657]).reshape(-1, 1)
y = np.array([6, 8, 2, 4, 0, 1, 7, 0, 1, 4])
poly_feat = PolynomialFeatures(degree=DEGREE)
X_poly = poly_feat.fit_transform(X)
lm = LinearRegression(fit_intercept = False)
lm.fit(X_poly, y)
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(X, lm.predict(X_poly), color='r')
ax.set_xlabel('Total Amount')
ax.set_ylabel('Days to mine')
ax.plot(X,y)
plt.show()
我猜是因为你没有足够的数据。您有 5 次多项式,但只有 10 个数据。该模型训练不好。我试着整理了一些数据,发现你的代码运行良好:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
BLOCK_REWARD = 380
DEGREE = 5
#X = np.array([276237,276617, 276997, 277377, 277757, 278137, 278517, 278897, 279277, 279657]).reshape(-1, 1)
#y = np.array([6, 8, 2, 4, 0, 1, 7, 0, 1, 4])
# New data
n = 50
X = np.linspace(-5, 5, n)
y = X**5 - 3 * X**4 + 2 * X**3 + 4 * X**2 - X + 6 + 200*np.random.randn(n)
X = X.reshape(-1, 1)
# Everything remain unchange
poly_feat = PolynomialFeatures(degree=DEGREE)
X_poly = poly_feat.fit_transform(X)
lm = LinearRegression(fit_intercept = False)
lm.fit(X_poly, y)
fig=plt.figure()
ax=fig.add_axes([0,0,1,1])
ax.scatter(X, lm.predict(X_poly), color='r')
ax.set_xlabel('Total Amount')
ax.set_ylabel('Days to mine')
ax.plot(X,y)
plt.show()