基于 Numpy 的梯度下降没有完全收敛
Numpy based gradient descent not fully converging
我相信我已经正确地实现了 GD(部分基于 Aurelien Geron 的书),但它没有返回与 sklearn 的线性回归相同的结果。这是完整的笔记本:
https://colab.research.google.com/drive/17lvCb_F_vMskT1PxbrKCSR57B5lMWT7A?usp=sharing
我没有做任何花哨的事情,这里是加载训练数据的代码:
import numpy as np
import pandas as pd
import sklearn.datasets
#load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values
X_raw = data_arr[:,1:]
y_raw = data_arr[:, 1:2]
#add bias
X = np.hstack((np.ones(y_raw.shape),X_raw))
y = y_raw
#do gradient descent
learning_rate = 0.001
iterations = 1_000_000
observations = X.shape[0]
features = X.shape[1]
w = np.ones((features,1))
for i in range(iterations):
w -= (learning_rate) * (2/observations) * X.T.dot(X.dot(w) - y)
这里是产生的权重:
array([[ 2.72774600e-17],
[ 1.01847403e+00],
[ 3.87858604e-02],
[ 3.06547577e-04],
[-3.67525543e-01],
[ 9.09006216e-02],
[ 4.21512716e-01],
[ 4.25673672e-01],
[ 4.77147289e-02],
[-8.14471370e-03]])
和 MSE:5.24937033143115e-05
这是 sklearn 给我的:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
%time reg = LinearRegression().fit(X, y)
reg.coef_
sklearn 权重:
array([[ 0.00000000e+00, 1.00000000e+00, -9.99200722e-16,
-1.69309011e-15, -1.11022302e-16, 1.38777878e-15,
-3.88578059e-16, 6.80011603e-16, -8.32667268e-17,
-5.55111512e-16]])
sklearn MSE:1.697650600978984e-32
我已经尝试 increase/decrease 时期的数量和学习率的大小。 Scikit-learn returns 在几毫秒内得到结果。我的 GD 实现可以 运行 几分钟,但仍然无法接近 sklearn 的结果。
我是不是做错了什么?
(笔记本包含此代码的更简洁版本。)
您的代码中存在一个小错误,因为 X_raw
的第一列与 y_raw
相同,即目标被用作特征。这已在下面的代码中得到纠正。
另一个问题是,如果你在特征矩阵X
中包含一列1,那么在用sklearn拟合线性回归时你应该确保设置fit_intercept=False
,否则你将有特征矩阵中的两列。
也不清楚为什么要除以梯度更新中的观测值数量,因为这会显着降低学习率。
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values
# extract features and target
X_raw = data_arr[:, 1:]
y_raw = data_arr[:, :1]
# add bias
X = np.hstack((np.ones(y_raw.shape), X_raw))
y = y_raw
# do gradient descent
learning_rate = 0.001
iterations = 1000000
observations = X.shape[0]
features = X.shape[1]
w = np.ones((features, 1))
for i in range(iterations):
w -= 2 * learning_rate * X.T.dot(X.dot(w) - y)
# exclude the intercept as X already contains a column of ones
reg = LinearRegression(fit_intercept=False).fit(X, y)
# compare the estimated coefficients
res = pd.DataFrame({
'manual': [format(x, '.6f') for x in w.flatten()],
'sklearn': [format(x, '.6f') for x in reg.coef_.flatten()]
})
res
# manual sklearn
# 0 -0.000000 -0.000000
# 1 0.101424 0.101424
# 2 -0.006468 -0.006468
# 3 0.208211 0.208211
# 4 -0.128653 -0.128653
# 5 0.236556 0.236556
# 6 0.132544 0.132544
# 7 -0.039359 -0.039359
# 8 0.177129 0.177129
# 9 0.145396 0.145396
# compare the RMSE
print(format(mean_squared_error(y, X.dot(w), squared=False), '.6f'))
# 0.043111
print(format(mean_squared_error(y, reg.predict(X), squared=False), '.6f'))
# 0.043111
我相信我已经正确地实现了 GD(部分基于 Aurelien Geron 的书),但它没有返回与 sklearn 的线性回归相同的结果。这是完整的笔记本: https://colab.research.google.com/drive/17lvCb_F_vMskT1PxbrKCSR57B5lMWT7A?usp=sharing
我没有做任何花哨的事情,这里是加载训练数据的代码:
import numpy as np
import pandas as pd
import sklearn.datasets
#load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values
X_raw = data_arr[:,1:]
y_raw = data_arr[:, 1:2]
#add bias
X = np.hstack((np.ones(y_raw.shape),X_raw))
y = y_raw
#do gradient descent
learning_rate = 0.001
iterations = 1_000_000
observations = X.shape[0]
features = X.shape[1]
w = np.ones((features,1))
for i in range(iterations):
w -= (learning_rate) * (2/observations) * X.T.dot(X.dot(w) - y)
这里是产生的权重:
array([[ 2.72774600e-17],
[ 1.01847403e+00],
[ 3.87858604e-02],
[ 3.06547577e-04],
[-3.67525543e-01],
[ 9.09006216e-02],
[ 4.21512716e-01],
[ 4.25673672e-01],
[ 4.77147289e-02],
[-8.14471370e-03]])
和 MSE:5.24937033143115e-05
这是 sklearn 给我的:
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
%time reg = LinearRegression().fit(X, y)
reg.coef_
sklearn 权重:
array([[ 0.00000000e+00, 1.00000000e+00, -9.99200722e-16,
-1.69309011e-15, -1.11022302e-16, 1.38777878e-15,
-3.88578059e-16, 6.80011603e-16, -8.32667268e-17,
-5.55111512e-16]])
sklearn MSE:1.697650600978984e-32
我已经尝试 increase/decrease 时期的数量和学习率的大小。 Scikit-learn returns 在几毫秒内得到结果。我的 GD 实现可以 运行 几分钟,但仍然无法接近 sklearn 的结果。
我是不是做错了什么?
(笔记本包含此代码的更简洁版本。)
您的代码中存在一个小错误,因为 X_raw
的第一列与 y_raw
相同,即目标被用作特征。这已在下面的代码中得到纠正。
另一个问题是,如果你在特征矩阵X
中包含一列1,那么在用sklearn拟合线性回归时你应该确保设置fit_intercept=False
,否则你将有特征矩阵中的两列。
也不清楚为什么要除以梯度更新中的观测值数量,因为这会显着降低学习率。
import numpy as np
import pandas as pd
import sklearn.datasets
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# load data
data_arr = sklearn.datasets.load_diabetes(as_frame=True).data.values
# extract features and target
X_raw = data_arr[:, 1:]
y_raw = data_arr[:, :1]
# add bias
X = np.hstack((np.ones(y_raw.shape), X_raw))
y = y_raw
# do gradient descent
learning_rate = 0.001
iterations = 1000000
observations = X.shape[0]
features = X.shape[1]
w = np.ones((features, 1))
for i in range(iterations):
w -= 2 * learning_rate * X.T.dot(X.dot(w) - y)
# exclude the intercept as X already contains a column of ones
reg = LinearRegression(fit_intercept=False).fit(X, y)
# compare the estimated coefficients
res = pd.DataFrame({
'manual': [format(x, '.6f') for x in w.flatten()],
'sklearn': [format(x, '.6f') for x in reg.coef_.flatten()]
})
res
# manual sklearn
# 0 -0.000000 -0.000000
# 1 0.101424 0.101424
# 2 -0.006468 -0.006468
# 3 0.208211 0.208211
# 4 -0.128653 -0.128653
# 5 0.236556 0.236556
# 6 0.132544 0.132544
# 7 -0.039359 -0.039359
# 8 0.177129 0.177129
# 9 0.145396 0.145396
# compare the RMSE
print(format(mean_squared_error(y, X.dot(w), squared=False), '.6f'))
# 0.043111
print(format(mean_squared_error(y, reg.predict(X), squared=False), '.6f'))
# 0.043111