R中梯度下降和线性模型之间Theta值的差异
Difference in Theta values between gradient descent and linear model in R
我正在使用波士顿数据集作为我的输入,我正在尝试建立一个模型来使用 RM(每个住宅的平均房间数)预测 MEDV(1000 美元的自住住房的中值)
我从 Digitheads blog 中篡改了以下代码,并没有像您看到的那样多。
我的代码如下:
#library(datasets)
#data("Boston")
x <- Boston$rm
y <- Boston$medv
# fit a linear model
res <- lm( y ~ x )
print(res)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-34.671 9.102
# plot the data and the model
plot(x,y, col=rgb(0.2,0.4,0.6,0.4), main='Linear regression')
abline(res, col='blue')
# squared error cost function
cost <- function(X, y, theta) {
sum( (X %*% theta - y)^2 ) / (2*length(y))
}
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 1000
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- t(X) %*% error / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -3.431269
[2,] 4.191125
根据 Digitheads 博客,他使用 lm(线性模型)的 theta 值与他的梯度下降值匹配,而我的不匹配。这些数字不应该匹配吗?
正如您从有关 theta 的各种值的绘图中看到的那样,我最终的 y 截距与几行以上的 print(theta) 值不一致?
任何人都可以就我哪里出错提出建议吗?
梯度下降需要一段时间才能收敛。增加迭代次数将使模型收敛到 OLS 值。例如:
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 100000 # Here I increase the number of iterations in your code to 100k.
# The gd algorithm now takes a minute or so to run on my admittedly
# middle-of-the-line laptop.
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent (now takes a little longer!)
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- (t(X) %*% error) / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -34.670410
[2,] 9.102076
我正在使用波士顿数据集作为我的输入,我正在尝试建立一个模型来使用 RM(每个住宅的平均房间数)预测 MEDV(1000 美元的自住住房的中值)
我从 Digitheads blog 中篡改了以下代码,并没有像您看到的那样多。
我的代码如下:
#library(datasets)
#data("Boston")
x <- Boston$rm
y <- Boston$medv
# fit a linear model
res <- lm( y ~ x )
print(res)
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-34.671 9.102
# plot the data and the model
plot(x,y, col=rgb(0.2,0.4,0.6,0.4), main='Linear regression')
abline(res, col='blue')
# squared error cost function
cost <- function(X, y, theta) {
sum( (X %*% theta - y)^2 ) / (2*length(y))
}
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 1000
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- t(X) %*% error / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -3.431269
[2,] 4.191125
根据 Digitheads 博客,他使用 lm(线性模型)的 theta 值与他的梯度下降值匹配,而我的不匹配。这些数字不应该匹配吗?
正如您从有关 theta 的各种值的绘图中看到的那样,我最终的 y 截距与几行以上的 print(theta) 值不一致?
任何人都可以就我哪里出错提出建议吗?
梯度下降需要一段时间才能收敛。增加迭代次数将使模型收敛到 OLS 值。例如:
# learning rate and iteration limit
alpha <- 0.01
num_iters <- 100000 # Here I increase the number of iterations in your code to 100k.
# The gd algorithm now takes a minute or so to run on my admittedly
# middle-of-the-line laptop.
# keep history
cost_history <- double(num_iters)
theta_history <- list(num_iters)
# initialize coefficients
theta <- matrix(c(0,0), nrow=2)
# add a column of 1's for the intercept coefficient
X <- cbind(1, matrix(x))
# gradient descent (now takes a little longer!)
for (i in 1:num_iters) {
error <- (X %*% theta - y)
delta <- (t(X) %*% error) / length(y)
theta <- theta - alpha * delta
cost_history[i] <- cost(X, y, theta)
theta_history[[i]] <- theta
}
print(theta)
[,1]
[1,] -34.670410
[2,] 9.102076