在 R 中使用 for 循环获取变量名来创建箱线图

Using for loops in R for variable names to Create Boxplots

我想创建箱线图来比较两组名为 tics1、tics2、tics3、tics4、tics5 的 5 个连续变量测量值。我可以用这段代码轻松做到这一点:

boxplot(tics1 ~ group, data=tics, col=c("hotpink", "cyan2"))
boxplot(tics2 ~ group, data=tics, col=c("hotpink", "cyan2"))
boxplot(tics3 ~ group, data=tics, col=c("hotpink", "cyan2"))
boxplot(tics4 ~ group, data=tics, col=c("hotpink", "cyan2"))
boxplot(tics5 ~ group, data=tics, col=c("hotpink", "cyan2"))

但我正在尝试使用 for 循环来提高效率。当我尝试这样做时,出现错误。

for (i in 1:5) {
  var <- paste0("tics", i)
  boxplot(var ~ group, data=tics, col=c("hotpink", "cyan2"))
}

Error in stats::model.frame.default(formula = var ~ group, data = tics) : variable lengths differ (found for 'group')

  1. 有没有办法修复我的 for 循环代码?
  2. 有没有办法在一个箱线图上进行所有 5 次比较?

如果你想把它们都画出来,那么你可以使用 ggplot2 中的 facet_wrap。你会想要转向长格式,然后你可以绘图。

library(tidyverse)

tics %>% 
  pivot_longer(-group) %>% 
  ggplot(aes(x = factor(group), y = value, fill = factor(group))) +
  geom_boxplot()+
  facet_wrap(~name)

输出

或者使用 for 循环,你可以这样做:

for (i in 1:5) {
  var <- paste0("tics", i)
  boxplot(tics[[var]] ~ group, data = tics, col=c("hotpink", "cyan2"))
}

如果你走这条路,那么sapply会更快,这里我也给每个地块加上名字。

sapply(1:5, \(x) {var <- paste0("tics", x); boxplot(tics[[var]] ~ tics$group, main = var)})

您也可以按索引循环,假设 group 是第一列并且数据框中只有 tic 列。

for (i in 2:5) {
  boxplot(mat[, i] ~ group, tics)
}

数据

tics <- structure(list(tics1 = c(0.0476190476190476, 0.0952380952380952, 
0.142857142857143, 0.19047619047619, 0.238095238095238, 0.285714285714286, 
0.333333333333333, 0.380952380952381, 0.428571428571429, 0.476190476190476, 
0.523809523809524, 0.571428571428571, 0.619047619047619, 0.666666666666667, 
0.714285714285714), tics2 = c(-0.692143884081275, 0.644709708117294, 
-1.57303517336961, 1.20119221027555, 0.609239967840388, -0.311524439591859, 
0.618602249192469, 0.731306188818431, 1.01016469827886, 1.28385223013644, 
-0.00178540309357942, 2.041746200149, -1.01431257489833, -1.61190976820524, 
1.63099766889229), tics3 = c(0.0520219824975517, 0.729165269851886, 
1.28805775316925, -1.09043323687797, 0.486936194669402, 0.800131923610429, 
1.22229153795252, 0.217159233531646, -0.163640790378808, 1.55459728200125, 
0.860175585334737, -1.73107965801683, -0.744770481693222, -2.59518985923938, 
0.246772490830949), tics4 = c(1.27763384585271, 0.939207828308425, 
5.76608257808322, 0.416700865464712, 3.55156271227215, 0.463652374864707, 
1.42103094782663, 0.724411125077308, 2.03621888478233, 0.760893978643801, 
0.75365623199256, 2.31626695810966, 0.0881069629466973, 1.16878624674157, 
2.27680839967629), group = c("A", "A", "A", "A", "A", "B", "B", 
"B", "B", "B", "C", "C", "C", "C", "C"), tics5 = c(0.0520219824975517, 
0.729165269851886, 1.28805775316925, -1.09043323687797, 0.486936194669402, 
0.800131923610429, 1.22229153795252, 0.217159233531646, -0.163640790378808, 
1.55459728200125, 0.860175585334737, -1.73107965801683, -0.744770481693222, 
-2.59518985923938, 0.246772490830949)), row.names = c(NA, -15L
), class = "data.frame")