哪个线性模型摘要行对应于公式中的哪一项?

Which linear model summary row corresponds to which term in formula?

线性模型的摘要使用某些字符串来表示其输出中的系数,例如:

summary(lm(
 target ~ some.bool + some.factor + some.factor*some.value +
          some.factor:some.other,
 data.frame(target=rnorm(100), some.bool=sample(c(T, F), 100, T),
  some.factor=sample(c('Y', 'N', 'M'), 100, T), some.value=rnorm(100),
  some.other=rnorm(100))))

结果 table 的名称为: some.boolTRUE, some.factorN, some.factorY, some.value, some.factorN:some.value, some.factorY:some.value, some.factorM:some.other, some.factorN:some.other, some.factorY:some.other.

如何以编程方式找出 table 的哪些行对应于输入公式的哪些项?我想要一些映射,例如:

`some.boolTRUE`            → some.bool
`some.factorN`:            → some.factor, some.factor*some.value
`some.factorY`:            → some.factor, some.factor*some.value
`some.value`:              → some.factor*some.value
`some.factorN:some.value`: → some.factor*some.value
`some.factorN:some.other`: → some.factor:some.other

我的目标是为结果准备一种特定的表示形式,其中线性回归的数据按输入项分组呈现。

因此,我注意到生成这些名称的代码位于称为外部 C 函数的 model.matrix 函数的深处。我可以使用如下 hack 来恢复由术语构建的名称(term 是从公式本身中取出的 expression/symbol 对象):

names.for.term <- function(term, data, order.as.in=term) {
  # construct a simple formula that has only the requested term
  f <- formula(substitute(~ x, list(x=term)))

  # make a terms object for manipulation
  term.terms <- terms(f, data=data)

  # what order do we want to consider variables in?
  requested.order <- na.omit(match(
    row.names(attr(terms(order.as.in), 'factors')),
    row.names(attr(term.terms, 'factors'))))

  # force the order of variables (setting row.names is enough;
  # values in this array are not important for the process of building
  # strings if you have only a single summand. if not, good luck)
  row.names(attr(term.terms, 'factors')) <-
    row.names(attr(term.terms, 'factors'))[requested.order]

  # we need model frame object to have columns in the same order as
  # rows above; types of variables (e.g. factors) are inferred from here
  m <- model.frame(f, data)[requested.order]

  # call deep into C code
  dimnames(.External2(stats:::C_modelmatrix, term.terms, m))[[2]][-1]
}

丑陋,但有效。由于字符串取决于此函数调用在术语中遇到的变量的顺序,因此您可能希望将完整公式作为 order.as.in 传递。现在唯一剩下的就是反转映射,这在这一点上是微不足道的。