lm 替代品的估计不正确
Incorrect estimates of lm alternatives
Dirk Eddelbuettel 提供了使用 lm
命令估计线性回归的替代方法。参见:http://dirk.eddelbuettel.com/blog/2011/07/05/
但是,他提到:
"Strictly-speaking, it is the only one we can compare to lm.fit()
which also uses a pivoting scheme. In the case of a degenerated model
matrix, all the other methods, including the four fastest approaches,
are susceptible to producing incorrect estimates."
有人可以通过提供一个例子来说明这一点当估计对 lm 正确但对替代方法不正确时?
安装 RcppArmadillo 或 RcppEigen 并查看 help(fastLm)
:
## case where fastLm breaks down
dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
xtabs(~ f2 + f1, dd) # one missing cell
mm <- model.matrix(~ f1 * f2, dd)
kappa(mm) # large, indicating rank deficiency
set.seed(1)
dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency
summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients
这个例子要归功于 Doug Bates。
Dirk Eddelbuettel 提供了使用 lm
命令估计线性回归的替代方法。参见:http://dirk.eddelbuettel.com/blog/2011/07/05/
但是,他提到:
"Strictly-speaking, it is the only one we can compare to lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix, all the other methods, including the four fastest approaches, are susceptible to producing incorrect estimates."
有人可以通过提供一个例子来说明这一点当估计对 lm 正确但对替代方法不正确时?
安装 RcppArmadillo 或 RcppEigen 并查看 help(fastLm)
:
## case where fastLm breaks down
dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
xtabs(~ f2 + f1, dd) # one missing cell
mm <- model.matrix(~ f1 * f2, dd)
kappa(mm) # large, indicating rank deficiency
set.seed(1)
dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
summary(lm(y ~ f1 * f2, dd)) # detects rank deficiency
summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients
这个例子要归功于 Doug Bates。