lm 替代品的估计不正确

Incorrect estimates of lm alternatives

Dirk Eddelbuettel 提供了使用 lm 命令估计线性回归的替代方法。参见:http://dirk.eddelbuettel.com/blog/2011/07/05/

但是,他提到:

"Strictly-speaking, it is the only one we can compare to lm.fit() which also uses a pivoting scheme. In the case of a degenerated model matrix, all the other methods, including the four fastest approaches, are susceptible to producing incorrect estimates."

有人可以通过提供一个例子来说明这一点当估计对 lm 正确但对替代方法不正确时?

安装 RcppArmadillo 或 RcppEigen 并查看 help(fastLm):

   ## case where fastLm breaks down
   dd <- data.frame(f1 = gl(4, 6, labels = LETTERS[1:4]),
                    f2 = gl(3, 2, labels = letters[1:3]))[-(7:8), ]
   xtabs(~ f2 + f1, dd)     # one missing cell
   mm <- model.matrix(~ f1 * f2, dd)
   kappa(mm)                # large, indicating rank deficiency
   set.seed(1)
   dd$y <- mm %*% seq_len(ncol(mm)) + rnorm(nrow(mm), sd = 0.1)
   summary(lm(y ~ f1 * f2, dd))     # detects rank deficiency
   summary(fastLm(y ~ f1 * f2, dd)) # some huge coefficients

这个例子要归功于 Doug Bates。