一个线性模型矩阵,其中分类的每个级别都与平均值进行对比
A linear model matrix where each level of a categorical is contrasted with the mean
我有 xy 数据,其中 y 是连续响应,x 是分类变量:
set.seed(1)
df <- data.frame(y = rnorm(27), group = c(rep("A",9),rep("B",9),rep("C",9)), stringsAsFactors = F)
我想拟合线性模型:y ~ group
,其中 df$group
中的每个水平都与平均值形成对比。
我认为使用 Deviation Coding 可以做到这一点:
lm(y ~ group,contrasts = "contr.sum",data=df)
但它跳过了对比组 A 的均值:
> summary(lm(y ~ group,contrasts = "contr.sum",data=df))
Call:
lm(formula = y ~ group, data = df, contrasts = "contr.sum")
Residuals:
Min 1Q Median 3Q Max
-1.6445 -0.6946 -0.1304 0.6593 1.9165
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2651 0.3457 -0.767 0.451
groupB 0.2057 0.4888 0.421 0.678
groupC 0.3985 0.4888 0.815 0.423
Residual standard error: 1.037 on 24 degrees of freedom
Multiple R-squared: 0.02695, Adjusted R-squared: -0.05414
F-statistic: 0.3324 on 2 and 24 DF, p-value: 0.7205
是否有任何函数可以构建 model matrix
以获得 df$group
的每个级别与摘要中的平均值进行对比?
我能想到的就是手动将 "mean" 级别添加到 df$group
并将其设置为基准 Dummy Coding:
df <- df %>% rbind(data.frame(y = mean(df$y), group ="mean"))
df$group <- factor(df$group, levels = c("mean","A","B","C"))
summary(lm(y ~ group,contrasts = "contr.treatment",data=df))
Call:
lm(formula = y ~ group, data = df, contrasts = "contr.treatment")
Residuals:
Min 1Q Median 3Q Max
-2.30003 -0.34864 0.07575 0.56896 1.42645
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.14832 0.95210 0.156 0.878
groupA 0.03250 1.00360 0.032 0.974
groupB -0.06300 1.00360 -0.063 0.950
groupC 0.03049 1.00360 0.030 0.976
Residual standard error: 0.9521 on 24 degrees of freedom
Multiple R-squared: 0.002457, Adjusted R-squared: -0.1222
F-statistic: 0.01971 on 3 and 24 DF, p-value: 0.9961
同样,假设我有两个分类变量的数据:
set.seed(1)
df <- data.frame(y = rnorm(18),
group = c(rep("A",9),rep("B",9)),
class = as.character(rep(c(rep(1,3),rep(2,3),rep(3,3)),2)))
我想估计每个级别的交互作用:(即 class1:groupB
、class2:groupB
和 class3:groupB
用于:
lm(y ~ class*group,contrasts = c("contr.sum","contr.treatment"),data=df)
如何获得?
在lm
公式中使用+0
省略截距,那么应该得到预期的对比编码:
summary(lm(y ~ 0 + group, contrasts = "contr.sum", data=df))
结果:
Call:
lm(formula = y ~ 0 + group, data = df, contrasts = "contr.sum")
Residuals:
Min 1Q Median 3Q Max
-2.3000 -0.3627 0.1487 0.5804 1.4264
Coefficients:
Estimate Std. Error t value Pr(>|t|)
groupA 0.18082 0.31737 0.570 0.574
groupB 0.08533 0.31737 0.269 0.790
groupC 0.17882 0.31737 0.563 0.578
Residual standard error: 0.9521 on 24 degrees of freedom
Multiple R-squared: 0.02891, Adjusted R-squared: -0.09248
F-statistic: 0.2381 on 3 and 24 DF, p-value: 0.8689
如果您想为交互执行此操作,可以采用以下一种方法:
lm(y ~ 0 + class:group,
contrasts = c("contr.sum","contr.treatment"),
data=df)
我有 xy 数据,其中 y 是连续响应,x 是分类变量:
set.seed(1)
df <- data.frame(y = rnorm(27), group = c(rep("A",9),rep("B",9),rep("C",9)), stringsAsFactors = F)
我想拟合线性模型:y ~ group
,其中 df$group
中的每个水平都与平均值形成对比。
我认为使用 Deviation Coding 可以做到这一点:
lm(y ~ group,contrasts = "contr.sum",data=df)
但它跳过了对比组 A 的均值:
> summary(lm(y ~ group,contrasts = "contr.sum",data=df))
Call:
lm(formula = y ~ group, data = df, contrasts = "contr.sum")
Residuals:
Min 1Q Median 3Q Max
-1.6445 -0.6946 -0.1304 0.6593 1.9165
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.2651 0.3457 -0.767 0.451
groupB 0.2057 0.4888 0.421 0.678
groupC 0.3985 0.4888 0.815 0.423
Residual standard error: 1.037 on 24 degrees of freedom
Multiple R-squared: 0.02695, Adjusted R-squared: -0.05414
F-statistic: 0.3324 on 2 and 24 DF, p-value: 0.7205
是否有任何函数可以构建 model matrix
以获得 df$group
的每个级别与摘要中的平均值进行对比?
我能想到的就是手动将 "mean" 级别添加到 df$group
并将其设置为基准 Dummy Coding:
df <- df %>% rbind(data.frame(y = mean(df$y), group ="mean"))
df$group <- factor(df$group, levels = c("mean","A","B","C"))
summary(lm(y ~ group,contrasts = "contr.treatment",data=df))
Call:
lm(formula = y ~ group, data = df, contrasts = "contr.treatment")
Residuals:
Min 1Q Median 3Q Max
-2.30003 -0.34864 0.07575 0.56896 1.42645
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.14832 0.95210 0.156 0.878
groupA 0.03250 1.00360 0.032 0.974
groupB -0.06300 1.00360 -0.063 0.950
groupC 0.03049 1.00360 0.030 0.976
Residual standard error: 0.9521 on 24 degrees of freedom
Multiple R-squared: 0.002457, Adjusted R-squared: -0.1222
F-statistic: 0.01971 on 3 and 24 DF, p-value: 0.9961
同样,假设我有两个分类变量的数据:
set.seed(1)
df <- data.frame(y = rnorm(18),
group = c(rep("A",9),rep("B",9)),
class = as.character(rep(c(rep(1,3),rep(2,3),rep(3,3)),2)))
我想估计每个级别的交互作用:(即 class1:groupB
、class2:groupB
和 class3:groupB
用于:
lm(y ~ class*group,contrasts = c("contr.sum","contr.treatment"),data=df)
如何获得?
在lm
公式中使用+0
省略截距,那么应该得到预期的对比编码:
summary(lm(y ~ 0 + group, contrasts = "contr.sum", data=df))
结果:
Call:
lm(formula = y ~ 0 + group, data = df, contrasts = "contr.sum")
Residuals:
Min 1Q Median 3Q Max
-2.3000 -0.3627 0.1487 0.5804 1.4264
Coefficients:
Estimate Std. Error t value Pr(>|t|)
groupA 0.18082 0.31737 0.570 0.574
groupB 0.08533 0.31737 0.269 0.790
groupC 0.17882 0.31737 0.563 0.578
Residual standard error: 0.9521 on 24 degrees of freedom
Multiple R-squared: 0.02891, Adjusted R-squared: -0.09248
F-statistic: 0.2381 on 3 and 24 DF, p-value: 0.8689
如果您想为交互执行此操作,可以采用以下一种方法:
lm(y ~ 0 + class:group,
contrasts = c("contr.sum","contr.treatment"),
data=df)