R - ggplot boxplot 在图中打印标准偏差值?
R - ggplot boxplot with standard deviation values printed in the plot?
我尽量把这个问题写得尽可能清楚和完整,感谢您的建设性批评:
我有一个名为 my_tibble
的 tibble
,看起来像这样:
# A tibble: 36 x 5
# Groups: fruit [4]
fruit length weight length_sd weight_sd
<fct> <dbl> <dbl> <dbl> <dbl>
1 Apple 0.531 0.0730 0.211 0.0292
2 Apple 0.489 0.0461 0.211 0.0292
3 Apple 0.503 0.0796 0.211 0.0292
4 Apple 0.560 0.0733 0.211 0.0292
5 Apple 0.533 0.0883 0.211 0.0292
6 Apple 0.612 0.127 0.211 0.0292
7 Apple 0.784 0.0671 0.211 0.0292
8 Apple 0.363 0.0623 0.211 0.0292
9 Apple 1.000 0.0291 0.211 0.0292
10 Apple 0.956 0.0284 0.211 0.0292
# ... with 26 more rows
length_sd
和 weight_sd
变量是 length
和 width
的标准差(是的,我知道这些数字是无意义的)对于每个分组的四个水果fruit
因子变量,即 Apple
、Banana
、Orange
和 Strawberry
.
我想绘制它们的长度和重量的箱线图,所以我先 gather()
编辑了数据:
my_tibble_gathered <- my_tibble %>%
ungroup() %>%
gather("length", "weight", key = "measurement", value = "value")
然后我 运行 ggplot2
用 facet_grid()
制作箱线图:
ggplot(data = my_tibble_gathered) +
geom_boxplot(mapping = aes(x = fruit, y = value)) +
facet_grid(~measurement)
这给了我:
到目前为止一切顺利。
不过,我还没有用到标准差数据呢。我想要的是:
每个水果内部[=]的打印标准偏差值(长度或重量取决于它们在的哪个方面) 79=]主线剧情,
建议不要触及箱形图本身,并且
在给定的字体和字体大小的指定小数位数(例如 3)处。
理想情况下,我也希望能够在其中使用标准偏差符号 (sigma)(所以也许可以使用 expression()
?)。
因此,例如,在 Apple
length
的箱线图顶部,会有文本显示为“[sigma symbol] = 0.211”,另一个 fruit
s.
如何以编程方式执行此操作并从 my_tibble
中获取数据,这样我就不必通过 annotate()
手动 copy/paste 数字?
非常感谢。
这是 my_tibble
的 dput()
:
my_tibble <- structure(list(fruit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Apple",
"Banana", "Orange", "Strawberry"), class = "factor"), length = c(0.530543135476024,
0.488977737310336, 0.503193533328075, 0.560337485188931, 0.533439933009971,
0.611517111445543, 0.784118643975375, 0.362563771715571, 0.999994359802019,
0.956308812233702, 0.332481969543643, 0.562729609348448, 0.635908731579197,
0.565161511593215, 0.526448727581439, 0.429069715902935, 0.460919459557728,
0.444385050459595, 0.503366669668819, 0.618141816193079, 0.516525710744663,
0.481938965057342, 0.505085048888451, 0.457048653556098, 0.536921608675353,
0.511397571854412, 0.442487815464855, 0.50103115023886, 0.305442471161553,
0.424241364519466, 2.45596087585689e-09, 0.122698840602406, 0.131431902209926,
0.205210819820745, 0.154445620769804, 0.161286627937974), weight = c(0.0729778030869548,
0.0460942475327506, 0.0796304213241703, 0.0732813711244074, 0.0882995825748408,
0.127183436952234, 0.0670534170610057, 0.0622813564507915, 0.0290840877242033,
0.0283807418126428, 0.107361724942771, 0.119133737366527, 0.185844270761176,
0.108155205104857, 0.189750275168087, 0.0845939609954818, 0.146490609941214,
0.14150784543994, 0.122840037806175, 0.143552891056291, 0.16798564927051,
0.241024152676673, 0.237508762873311, 0.20455939607561, 0.316350856257808,
0.30730862083812, 0.184386251393058, 0.181923008217247, 0.332024894278287,
0.194530111145869, 0.0166977795512452, 0.0569762924658561, 0.0739793228272142,
0.0433330479654348, 0.099781312832018, 0.0396375225550451), length_sd = c(0.21053610140121,
0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121,
0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121,
0.21053610140121, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132,
0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132,
0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.067296241260161,
0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161,
0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161,
0.067296241260161, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205,
0.0695477116271205, 0.0695477116271205, 0.0695477116271205),
weight_sd = c(0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0290125579882519, 0.0290125579882519, 0.0290125579882519,
0.0290125579882519, 0.0290125579882519, 0.0290125579882519
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -36L), vars = "fruit", labels = structure(list(
fruit = structure(1:4, .Label = c("Apple", "Banana", "Orange",
"Strawberry"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L), vars = "fruit", drop = TRUE), indices = list(0:9, 20:29,
10:19, 30:35), drop = TRUE, group_sizes = c(10L, 10L, 10L,
6L), biggest_group_size = 10L)
你可以试试这个有点老套的方法:
d %>%
# transform from wide to long similar as you did already
gather(k, v, -fruit, -ends_with("sd")) %>%
# add corresponding sd values
mutate(label = ifelse(k == "length", length_sd, weight_sd)) %>%
# prepare the label as expression
mutate(label = paste0("sigma==", round(label, 3))) %>%
# add factor for alpha by adding the second group
group_by(k, add = T) %>%
mutate(Alpha=c(1, rep(0, n()-1))) %>%
ggplot(aes(fruit, v)) +
geom_boxplot() +
geom_text(aes(y=max(v) + 0.1,
label=label,
alpha=factor(Alpha)),
size=3,
show.legend = F,
parse = T) +
facet_grid(~k) +
scale_alpha_manual(values=c(0, 1))
您必须转换 sd
值对应于 fruit
和 k
列的数据,就像在 label
列中一样。然后你必须添加一个二元因子以避免使用 alpha 参数过度绘制。
d %>%
gather(k, v, -fruit, -ends_with("sd")) %>%
mutate(label=ifelse(k == "length",length_sd,weight_sd )) %>%
group_by(k, add=T) %>%
mutate(Alpha=c(1,rep(0,n()-1))) %>%
head(3)
# A tibble: 3 x 7
# Groups: fruit, k [1]
fruit length_sd weight_sd k v label Alpha
<fct> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Apple 0.211 0.0292 length 0.531 0.211 1
2 Apple 0.211 0.0292 length 0.489 0.211 0
3 Apple 0.211 0.0292 length 0.503 0.211 0
我尽量把这个问题写得尽可能清楚和完整,感谢您的建设性批评:
我有一个名为 my_tibble
的 tibble
,看起来像这样:
# A tibble: 36 x 5
# Groups: fruit [4]
fruit length weight length_sd weight_sd
<fct> <dbl> <dbl> <dbl> <dbl>
1 Apple 0.531 0.0730 0.211 0.0292
2 Apple 0.489 0.0461 0.211 0.0292
3 Apple 0.503 0.0796 0.211 0.0292
4 Apple 0.560 0.0733 0.211 0.0292
5 Apple 0.533 0.0883 0.211 0.0292
6 Apple 0.612 0.127 0.211 0.0292
7 Apple 0.784 0.0671 0.211 0.0292
8 Apple 0.363 0.0623 0.211 0.0292
9 Apple 1.000 0.0291 0.211 0.0292
10 Apple 0.956 0.0284 0.211 0.0292
# ... with 26 more rows
length_sd
和 weight_sd
变量是 length
和 width
的标准差(是的,我知道这些数字是无意义的)对于每个分组的四个水果fruit
因子变量,即 Apple
、Banana
、Orange
和 Strawberry
.
我想绘制它们的长度和重量的箱线图,所以我先 gather()
编辑了数据:
my_tibble_gathered <- my_tibble %>%
ungroup() %>%
gather("length", "weight", key = "measurement", value = "value")
然后我 运行 ggplot2
用 facet_grid()
制作箱线图:
ggplot(data = my_tibble_gathered) +
geom_boxplot(mapping = aes(x = fruit, y = value)) +
facet_grid(~measurement)
这给了我:
到目前为止一切顺利。
不过,我还没有用到标准差数据呢。我想要的是:
每个水果内部[=]的打印标准偏差值(长度或重量取决于它们在的哪个方面) 79=]主线剧情,
建议不要触及箱形图本身,并且
在给定的字体和字体大小的指定小数位数(例如 3)处。
理想情况下,我也希望能够在其中使用标准偏差符号 (sigma)(所以也许可以使用
expression()
?)。
因此,例如,在 Apple
length
的箱线图顶部,会有文本显示为“[sigma symbol] = 0.211”,另一个 fruit
s.
如何以编程方式执行此操作并从 my_tibble
中获取数据,这样我就不必通过 annotate()
手动 copy/paste 数字?
非常感谢。
这是 my_tibble
的 dput()
:
my_tibble <- structure(list(fruit = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Apple",
"Banana", "Orange", "Strawberry"), class = "factor"), length = c(0.530543135476024,
0.488977737310336, 0.503193533328075, 0.560337485188931, 0.533439933009971,
0.611517111445543, 0.784118643975375, 0.362563771715571, 0.999994359802019,
0.956308812233702, 0.332481969543643, 0.562729609348448, 0.635908731579197,
0.565161511593215, 0.526448727581439, 0.429069715902935, 0.460919459557728,
0.444385050459595, 0.503366669668819, 0.618141816193079, 0.516525710744663,
0.481938965057342, 0.505085048888451, 0.457048653556098, 0.536921608675353,
0.511397571854412, 0.442487815464855, 0.50103115023886, 0.305442471161553,
0.424241364519466, 2.45596087585689e-09, 0.122698840602406, 0.131431902209926,
0.205210819820745, 0.154445620769804, 0.161286627937974), weight = c(0.0729778030869548,
0.0460942475327506, 0.0796304213241703, 0.0732813711244074, 0.0882995825748408,
0.127183436952234, 0.0670534170610057, 0.0622813564507915, 0.0290840877242033,
0.0283807418126428, 0.107361724942771, 0.119133737366527, 0.185844270761176,
0.108155205104857, 0.189750275168087, 0.0845939609954818, 0.146490609941214,
0.14150784543994, 0.122840037806175, 0.143552891056291, 0.16798564927051,
0.241024152676673, 0.237508762873311, 0.20455939607561, 0.316350856257808,
0.30730862083812, 0.184386251393058, 0.181923008217247, 0.332024894278287,
0.194530111145869, 0.0166977795512452, 0.0569762924658561, 0.0739793228272142,
0.0433330479654348, 0.099781312832018, 0.0396375225550451), length_sd = c(0.21053610140121,
0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121,
0.21053610140121, 0.21053610140121, 0.21053610140121, 0.21053610140121,
0.21053610140121, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132,
0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.0933430177635132,
0.0933430177635132, 0.0933430177635132, 0.0933430177635132, 0.067296241260161,
0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161,
0.067296241260161, 0.067296241260161, 0.067296241260161, 0.067296241260161,
0.067296241260161, 0.0695477116271205, 0.0695477116271205, 0.0695477116271205,
0.0695477116271205, 0.0695477116271205, 0.0695477116271205),
weight_sd = c(0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.0292441784658992, 0.0292441784658992,
0.0292441784658992, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.033755823218546,
0.033755823218546, 0.033755823218546, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0611975080850528, 0.0611975080850528, 0.0611975080850528,
0.0290125579882519, 0.0290125579882519, 0.0290125579882519,
0.0290125579882519, 0.0290125579882519, 0.0290125579882519
)), class = c("grouped_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -36L), vars = "fruit", labels = structure(list(
fruit = structure(1:4, .Label = c("Apple", "Banana", "Orange",
"Strawberry"), class = "factor")), class = "data.frame", row.names = c(NA,
-4L), vars = "fruit", drop = TRUE), indices = list(0:9, 20:29,
10:19, 30:35), drop = TRUE, group_sizes = c(10L, 10L, 10L,
6L), biggest_group_size = 10L)
你可以试试这个有点老套的方法:
d %>%
# transform from wide to long similar as you did already
gather(k, v, -fruit, -ends_with("sd")) %>%
# add corresponding sd values
mutate(label = ifelse(k == "length", length_sd, weight_sd)) %>%
# prepare the label as expression
mutate(label = paste0("sigma==", round(label, 3))) %>%
# add factor for alpha by adding the second group
group_by(k, add = T) %>%
mutate(Alpha=c(1, rep(0, n()-1))) %>%
ggplot(aes(fruit, v)) +
geom_boxplot() +
geom_text(aes(y=max(v) + 0.1,
label=label,
alpha=factor(Alpha)),
size=3,
show.legend = F,
parse = T) +
facet_grid(~k) +
scale_alpha_manual(values=c(0, 1))
您必须转换 sd
值对应于 fruit
和 k
列的数据,就像在 label
列中一样。然后你必须添加一个二元因子以避免使用 alpha 参数过度绘制。
d %>%
gather(k, v, -fruit, -ends_with("sd")) %>%
mutate(label=ifelse(k == "length",length_sd,weight_sd )) %>%
group_by(k, add=T) %>%
mutate(Alpha=c(1,rep(0,n()-1))) %>%
head(3)
# A tibble: 3 x 7
# Groups: fruit, k [1]
fruit length_sd weight_sd k v label Alpha
<fct> <dbl> <dbl> <chr> <dbl> <dbl> <dbl>
1 Apple 0.211 0.0292 length 0.531 0.211 1
2 Apple 0.211 0.0292 length 0.489 0.211 0
3 Apple 0.211 0.0292 length 0.503 0.211 0