使用 lm 构建回归模型时出错(`contrasts<-`(`*tmp*`... contrasts can be applied only to factors with 2 or more levels)

Error when building regression model using lm ( Error in `contrasts<-`(`*tmp*`... contrasts can be applied only to factors with 2 or more levels)

我收到此错误取决于我包含哪些变量以及我在公式中指定它们的顺序:

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

我对此做了一些研究,看起来这是由所讨论的变量不是因子变量引起的。在这种情况下 (is_women_owned),它是一个具有 2 个水平的因子变量 ("Yes"、"No")。

> levels(customer_accounts$is_women_owned)
[1] "No"  "Yes"

没有错误:

f1 <- lm(combined_sales ~ is_women_owned, data=customer_accounts)

没有错误:

f2 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth, data=customer_accounts)

对上述公式进行回归加上因子变量"is_women_owned":

f3 <- lm(combined_sales ~ total_assets + market_value + total_empl + empl_growth + sic + city + revenue_growth + revenue + net_income + income_growth + is_women_owned, data=customer_accounts)

Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

如您所料,我在应用逐步线性回归时遇到了同样的错误。

这似乎是一个错误,它应该给我们一个模型,其中 "is_women_owned" 可能没有提供额外的解释值,因为它与其他变量高度相关,而不是像这样出错。

我验证了这个变量也没有丢失数据:

> which(is.na(customer_accounts$is_women_owned))
integer(0)

此外,因子变量中存在两个值:

customer_accounts$is_women_owned[1:20]
 [1] No  No  No  No  No  No  No  No  No  No  No  No  No  No  Yes No 
[17] No  No  No  No 
Levels: No Yes
twofac = data.frame("y" = c(1,2,3,4,5,1), "x" = c(2,56,3,5,2,1), "f" = c("apple","apple","apple","apple","apple","banana"))
onefac = twofac[1:5,]

lm(y~x+f,data=twofac)
lm(y~x+f,data=onefac)

> str(onefac)
'data.frame':   5 obs. of  3 variables:
 $ y: num  1 2 3 4 5
 $ x: num  2 56 3 5 2
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1
> str(twofac)
'data.frame':   6 obs. of  3 variables:
 $ y: num  1 2 3 4 5 1
 $ x: num  2 56 3 5 2 1
 $ f: Factor w/ 2 levels "apple","banana": 1 1 1 1 1 2
> lm(y~x+f,data=twofac)

Call:
lm(formula = y ~ x + f, data = twofac)

Coefficients:
(Intercept)            x      fbanana  
    3.30783     -0.02263     -2.28519  

> lm(y~x+f,data=onefac)
Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : 
  contrasts can be applied only to factors with 2 or more levels

如果你 运行 以上你会注意到 twofac,一个具有 2 级因子的模型,其中两个因子都存在,将 运行 没有问题。 onefac 是一个具有相同 2 级因子但仅存在一个级别的模型,给出与您得到的相同的错误。

如果您的因素只有一个水平,那么针对该因素进行回归不会提供额外信息,因为它在所有响应变量中都是恒定的