创建虚拟变量进行双向方差分析
Create dummy variable to do two-way ANOVA
d = data.frame(
Temperature = c(rep("Cool", 6), rep("Warm", 6)),
Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
)
我为双向方差分析自行创建了一个小数据框。我想通过
执行双向方差分析模型
summary(aov(Time~Bact*Temperature, data=d))
时间是因变量,而 Bact 和温度是两个分类自变量。
我想学习并证明方差分析也可以用线性回归模型来完成,而不是以方差分析的方式进行。我想将我的数据转换为虚拟变量并对其执行线性回归。我希望我会恢复相同的结果。虚拟变量还将包括 Bact 和 Temperature 之间的交互作用。
问题是,我不知道如何将我的数据框转换为虚拟变量,以便它可以在 lm() 函数中使用。
lm()
将为您创建虚拟变量。无需自己创建它们:
m <- lm(Time ~ Bact*Temperature, data = d)
anova(m)
编辑
如果您想了解 lm()
的幕后花絮,可以查看带有 model.matrix(m)
的设计矩阵
我对你也是这样。我想控制一切,所以每当我有时间时,我都会自己设计假人:
d = data.frame(
Temperature = c(rep("Cool", 6), rep("Warm", 6)),
Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
)
即:
> d
Temperature Bact Time
1 Cool Bact 1 15.23
2 Cool Bact 1 14.32
3 Cool Bact 2 14.77
4 Cool Bact 2 15.12
5 Cool Bact 3 14.05
6 Cool Bact 3 15.48
7 Warm Bact 1 14.13
8 Warm Bact 1 16.13
9 Warm Bact 2 16.44
10 Warm Bact 2 14.82
11 Warm Bact 3 17.96
12 Warm Bact 3 16.65
所以你只需要虚拟化因素(温度,bact)所以下面的过程有效:
xfactors <- Filter(is.factor,d) #filter only the factors to dummify
b <- data.frame(matrix(NA,nrow=nrow(xfactors),ncol=1)) #make empty data.frame to initiate b
for ( i in 1:ncol(xfactors)) { #start loop
a <- data.frame(model.matrix(~xfactors[,i])) #make dummies here
b <- cbind(b, a[-1]) #remove intercept and combine dummies
}
b <- data.frame(b[-1]) #make a data.frame
#the reference dummy gets excluded automatically by model.matrix
colnames(b) <- c('warm' , 'bact2' , 'bact3') #you will probably want to change the names to sth smaller
> b
warm bact2 bact3
1 0 0 0
2 0 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
7 1 0 0
8 1 0 0
9 1 1 0
10 1 1 0
11 1 0 1
12 1 0 1
然后到运行模型:
new_data <- cbind(b, Time=d$Time) #add time to the data
mymod <- lm(Time ~ warm*bact2+warm*bact3, data=new_data) #compute lm with interactions
#you shouldn't compute the interactions between dummy variables because they come from the same variable
输出:
> summary(mymod)
Call:
lm(formula = Time ~ warm * bact2 + warm * bact3, data = new_data)
Residuals:
Min 1Q Median 3Q Max
-1.00 -0.67 0.00 0.67 1.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.7750 0.6873 21.498 6.61e-07 ***
warm 0.3550 0.9719 0.365 0.727
bact2 0.1700 0.9719 0.175 0.867
bact3 -0.0100 0.9719 -0.010 0.992
warm:bact2 0.3300 1.3745 0.240 0.818
warm:bact3 2.1850 1.3745 1.590 0.163
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9719 on 6 degrees of freedom
Multiple R-squared: 0.6264, Adjusted R-squared: 0.3151
F-statistic: 2.012 on 5 and 6 DF, p-value: 0.2097
d = data.frame(
Temperature = c(rep("Cool", 6), rep("Warm", 6)),
Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
)
我为双向方差分析自行创建了一个小数据框。我想通过
执行双向方差分析模型summary(aov(Time~Bact*Temperature, data=d))
时间是因变量,而 Bact 和温度是两个分类自变量。
我想学习并证明方差分析也可以用线性回归模型来完成,而不是以方差分析的方式进行。我想将我的数据转换为虚拟变量并对其执行线性回归。我希望我会恢复相同的结果。虚拟变量还将包括 Bact 和 Temperature 之间的交互作用。
问题是,我不知道如何将我的数据框转换为虚拟变量,以便它可以在 lm() 函数中使用。
lm()
将为您创建虚拟变量。无需自己创建它们:
m <- lm(Time ~ Bact*Temperature, data = d)
anova(m)
编辑
如果您想了解 lm()
的幕后花絮,可以查看带有 model.matrix(m)
我对你也是这样。我想控制一切,所以每当我有时间时,我都会自己设计假人:
d = data.frame(
Temperature = c(rep("Cool", 6), rep("Warm", 6)),
Bact = c(rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2), rep("Bact 1", 2), rep("Bact 2", 2), rep("Bact 3", 2)),
Time = c(15.23,14.32,14.77,15.12,14.05,15.48,14.13,16.13,16.44,14.82,17.96,16.65)
)
即:
> d
Temperature Bact Time
1 Cool Bact 1 15.23
2 Cool Bact 1 14.32
3 Cool Bact 2 14.77
4 Cool Bact 2 15.12
5 Cool Bact 3 14.05
6 Cool Bact 3 15.48
7 Warm Bact 1 14.13
8 Warm Bact 1 16.13
9 Warm Bact 2 16.44
10 Warm Bact 2 14.82
11 Warm Bact 3 17.96
12 Warm Bact 3 16.65
所以你只需要虚拟化因素(温度,bact)所以下面的过程有效:
xfactors <- Filter(is.factor,d) #filter only the factors to dummify
b <- data.frame(matrix(NA,nrow=nrow(xfactors),ncol=1)) #make empty data.frame to initiate b
for ( i in 1:ncol(xfactors)) { #start loop
a <- data.frame(model.matrix(~xfactors[,i])) #make dummies here
b <- cbind(b, a[-1]) #remove intercept and combine dummies
}
b <- data.frame(b[-1]) #make a data.frame
#the reference dummy gets excluded automatically by model.matrix
colnames(b) <- c('warm' , 'bact2' , 'bact3') #you will probably want to change the names to sth smaller
> b
warm bact2 bact3
1 0 0 0
2 0 0 0
3 0 1 0
4 0 1 0
5 0 0 1
6 0 0 1
7 1 0 0
8 1 0 0
9 1 1 0
10 1 1 0
11 1 0 1
12 1 0 1
然后到运行模型:
new_data <- cbind(b, Time=d$Time) #add time to the data
mymod <- lm(Time ~ warm*bact2+warm*bact3, data=new_data) #compute lm with interactions
#you shouldn't compute the interactions between dummy variables because they come from the same variable
输出:
> summary(mymod)
Call:
lm(formula = Time ~ warm * bact2 + warm * bact3, data = new_data)
Residuals:
Min 1Q Median 3Q Max
-1.00 -0.67 0.00 0.67 1.00
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 14.7750 0.6873 21.498 6.61e-07 ***
warm 0.3550 0.9719 0.365 0.727
bact2 0.1700 0.9719 0.175 0.867
bact3 -0.0100 0.9719 -0.010 0.992
warm:bact2 0.3300 1.3745 0.240 0.818
warm:bact3 2.1850 1.3745 1.590 0.163
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.9719 on 6 degrees of freedom
Multiple R-squared: 0.6264, Adjusted R-squared: 0.3151
F-statistic: 2.012 on 5 and 6 DF, p-value: 0.2097