分类预测变量回归中的标准化系数：有问题

Question

据我了解，标准化系数可以用作效果大小的指标（可以使用经验法则，例如 Cohen's 1988）。我还了解到标准化系数是 expressed in terms of standard deviation，这使得它们相对接近 Cohen 的 d。

我还了解到，获得标准化系数的一种方法是事先对数据进行标准化。另一种方法是使用 MuMIn 包中的 std.coef 函数。

这两种方法在使用线性预测器时是等效的：

library(tidyverse)
library(MuMIn) # For stds coefs


df <- iris %>% 
  select(Sepal.Length, Sepal.Width) %>% 
  scale() %>% 
  as.data.frame() %>% 
  mutate(Species = iris$Species)


fit <- lm(Sepal.Length ~ Sepal.Width, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)

在这两种情况下，系数都是-0.12。我是这样理解的：Sepal.Width每增加1个标准差，Sepal.Length就减少0.12的SD.

然而，这两种方法给出 不同的结果 与分类预测：

fit <- lm(Sepal.Length ~ Species, data=df)
round(coef(fit), 2)
round(MuMIn::std.coef(fit, partial.sd = TRUE), 2)

与 setosa（截距）相比，versicolor 的效果分别为 1.12 和 0.46。

我应该相信谁能说"the difference between versicolor and setosa is ... of Sepal.Length's SD"？非常感谢

Answer 1

您没有标准化与 Species 关联的隐式变量，因此这些系数不会被标准化。

您可以这样做：

dummies <- scale(contrasts(df$Species)[df$Species,])
fit <- lm(Sepal.Length ~ dummies, data = df)
round(coef(fit), 2)
#      (Intercept) dummiesversicolor  dummiesvirginica 
#             0.00              0.53              0.90

如果将 partial.sd 参数设置为 FALSE，这与 MuMIn::std.coef 的结果一致。

分类预测变量回归中的标准化系数：有问题

Standardized coefs in regression with a categorical predictor: there's something wrong

regression

r

effect

linear-regression