通过循环迭代 one-way ANOVA 会在 R 中引发错误
Iterating one-way ANOVA through loop throws error in R
我正在尝试遍历一个大型数据框 [5413 列] 和 运行 每列的方差分析,但是我在尝试这样做时遇到了错误。
我想将方差分析的 P 值写入包含列标题的数据框中的新行。但受限于我目前的知识,我正在将 P-value 输出写入我可以在 bash.
中解析的文件
下面是数据布局示例:
data()
Name, Group, aaaA, aaaE, bbbR, cccD
Apple, Fruit, 1.23, 0.45, 0.3, 1.1
Banana, Fruit, 0.54, 0.12, 2.0, 1.32
Carrot, Vegetable, 0.01, 0.05, 0.45, 0.9
Pear, Fruit, 0.1, 0.2, 0.1, 0.3
Fox, Animal, 1.0, 0.9, 1.2, 0.8
Dog, Animal, 1.2, 1.1, 0.8, 0.7
这里是 dput 的输出:
structure(list(Name = structure(c(1L, 2L, 3L, 6L, 5L, 4L), .Label = c("Apple",
"Banana", "Carrot", "Dog", "Fox", "Pear"), class = "factor"),
Group = structure(c(2L, 2L, 3L, 2L, 1L, 1L), .Label = c(" Animal",
" Fruit", " Vegetable"), class = "factor"), aaaA = c(1.23,
0.54, 0.01, 0.1, 1, 1.2), aaaE = c(0.45, 0.12, 0.05, 0.2,
0.9, 1.1), bbbR = c(0.3, 2, 0.45, 0.1, 1.2, 0.8), cccD = c(1.1,
1.32, 0.9, 0.3, 0.8, 0.7)), class = "data.frame", row.names = c(NA,
-6L))
为了获得成功的输出,我做了:
summary(aov(aaaA ~ Group, data=data))[[1]][["Pr(>F)"]]
然后我尝试在循环中实现它:
for(i in names(data[3:6])){
out <- summary(aov(i ~ Group, data=data))[[1]][["Pr(>F)"]]
write.csv(out, i)}
其中returns错误:
Error in model.frame.default(formula = i ~ Group, data = test, drop.unused.levels = TRUE) :
variable lengths differ (found for 'Group')
任何人都可以帮助解决错误或实施 per-column 方差分析吗?
我们可以执行以下操作,然后获取 p 值:
to_use<-setdiff(names(df),"aaaA")
lapply(to_use,function(x) summary(do.call(aov,list(as.formula(paste("aaaA","~",x)),
data=df))))
这给你:
[[1]]
Df Sum Sq Mean Sq
Name 5 1.48 0.296
[[2]]
Df Sum Sq Mean Sq F value Pr(>F)
Group 2 0.8113 0.4057 1.819 0.304
Residuals 3 0.6689 0.2230
[[3]]
Df Sum Sq Mean Sq F value Pr(>F)
aaaE 1 0.9286 0.9286 6.733 0.0604 .
Residuals 4 0.5516 0.1379
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[[4]]
Df Sum Sq Mean Sq F value Pr(>F)
bbbR 1 0.043 0.0430 0.12 0.747
Residuals 4 1.437 0.3593
[[5]]
Df Sum Sq Mean Sq F value Pr(>F)
cccD 1 0.1129 0.1129 0.33 0.596
Residuals 4 1.3673 0.3418
我正在尝试遍历一个大型数据框 [5413 列] 和 运行 每列的方差分析,但是我在尝试这样做时遇到了错误。
我想将方差分析的 P 值写入包含列标题的数据框中的新行。但受限于我目前的知识,我正在将 P-value 输出写入我可以在 bash.
中解析的文件下面是数据布局示例:
data()
Name, Group, aaaA, aaaE, bbbR, cccD
Apple, Fruit, 1.23, 0.45, 0.3, 1.1
Banana, Fruit, 0.54, 0.12, 2.0, 1.32
Carrot, Vegetable, 0.01, 0.05, 0.45, 0.9
Pear, Fruit, 0.1, 0.2, 0.1, 0.3
Fox, Animal, 1.0, 0.9, 1.2, 0.8
Dog, Animal, 1.2, 1.1, 0.8, 0.7
这里是 dput 的输出:
structure(list(Name = structure(c(1L, 2L, 3L, 6L, 5L, 4L), .Label = c("Apple",
"Banana", "Carrot", "Dog", "Fox", "Pear"), class = "factor"),
Group = structure(c(2L, 2L, 3L, 2L, 1L, 1L), .Label = c(" Animal",
" Fruit", " Vegetable"), class = "factor"), aaaA = c(1.23,
0.54, 0.01, 0.1, 1, 1.2), aaaE = c(0.45, 0.12, 0.05, 0.2,
0.9, 1.1), bbbR = c(0.3, 2, 0.45, 0.1, 1.2, 0.8), cccD = c(1.1,
1.32, 0.9, 0.3, 0.8, 0.7)), class = "data.frame", row.names = c(NA,
-6L))
为了获得成功的输出,我做了:
summary(aov(aaaA ~ Group, data=data))[[1]][["Pr(>F)"]]
然后我尝试在循环中实现它:
for(i in names(data[3:6])){
out <- summary(aov(i ~ Group, data=data))[[1]][["Pr(>F)"]]
write.csv(out, i)}
其中returns错误:
Error in model.frame.default(formula = i ~ Group, data = test, drop.unused.levels = TRUE) :
variable lengths differ (found for 'Group')
任何人都可以帮助解决错误或实施 per-column 方差分析吗?
我们可以执行以下操作,然后获取 p 值:
to_use<-setdiff(names(df),"aaaA")
lapply(to_use,function(x) summary(do.call(aov,list(as.formula(paste("aaaA","~",x)),
data=df))))
这给你:
[[1]]
Df Sum Sq Mean Sq
Name 5 1.48 0.296
[[2]]
Df Sum Sq Mean Sq F value Pr(>F)
Group 2 0.8113 0.4057 1.819 0.304
Residuals 3 0.6689 0.2230
[[3]]
Df Sum Sq Mean Sq F value Pr(>F)
aaaE 1 0.9286 0.9286 6.733 0.0604 .
Residuals 4 0.5516 0.1379
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
[[4]]
Df Sum Sq Mean Sq F value Pr(>F)
bbbR 1 0.043 0.0430 0.12 0.747
Residuals 4 1.437 0.3593
[[5]]
Df Sum Sq Mean Sq F value Pr(>F)
cccD 1 0.1129 0.1129 0.33 0.596
Residuals 4 1.3673 0.3418