使用 TukeyHSD 的输出自动将重要字母添加到 ggplot barplot
Automatically adding letters of significance to a ggplot barplot using output from TukeyHSD
使用此数据...
hogs.sample<-structure(list(Zone = c("B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "D",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D", "D", "D"), Levelname = c("Medium", "High",
"Low", "Med.High", "Med.High", "Med.High", "Med.High", "Med.High",
"Med.High", "Medium", "Med.High", "Medium", "Med.High", "High",
"Medium", "High", "Low", "Med.High", "Low", "High", "Medium",
"Medium", "Med.High", "Low", "Low", "Med.High", "Low", "Low",
"High", "High", "Med.High", "High", "Med.High", "Med.High", "Medium",
"High", "Low", "Low", "Med.High", "Low"), hogs.fit = c(-0.122,
-0.871, -0.279, -0.446, 0.413, 0.011, 0.157, 0.131, 0.367, -0.23,
0.007, 0.05, 0.04, -0.184, -0.265, -1.071, -0.223, 0.255, -0.635,
-1.103, 0.008, -0.04, 0.831, 0.042, -0.005, -0.022, 0.692, 0.402,
0.615, 0.785, 0.758, 0.738, 0.512, 0.222, -0.424, 0.556, -0.128,
-0.495, 0.591, 0.923)), row.names = c(NA, -40L), groups = structure(list(
Zone = c("B", "D"), .rows = structure(list(1:20, 21:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我正在尝试将基于 Tukeys HSD 的重要字母添加到下面的图中...
library(agricolae)
library(tidyverse)
hogs.plot <- ggplot(hogs.sample, aes(x = Zone, y = exp(hogs.fit),
fill = factor(Levelname, levels = c("High", "Med.High", "Medium", "Low")))) +
stat_summary(fun = mean, geom = "bar", position = position_dodge(0.9), color = "black") +
stat_summary(fun.data = mean_se, geom = "errorbar", position = position_dodge(0.9), width = 0.2) +
labs(x = "", y = "CPUE (+/-1SE)", legend = NULL) +
scale_y_continuous(expand = c(0,0), labels = scales::number_format(accuracy = 0.1)) +
scale_fill_manual(values = c("midnightblue", "dodgerblue4", "steelblue3", 'lightskyblue')) +
scale_x_discrete(breaks=c("B", "D"), labels=c("Econfina", "Steinhatchee"))+
scale_color_hue(l=40, c = 100)+
# coord_cartesian(ylim = c(0, 3.5)) +
labs(title = "Hogs", x = "", legend = NULL) +
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.background = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(),
axis.text.x = element_text(), axis.title.x = element_text(vjust = 0),
axis.title.y = element_text(size = 8))+
theme(legend.title = element_blank(),
legend.position = "none")
hogs.plot
我理想的输出应该是这样的...
我不确定这些字母在我的示例图中是否 100% 准确,但它们表示哪些组彼此之间存在显着差异。区域是独立的,所以我不想在两个区域之间进行任何比较,所以我 运行 使用以下代码将它们分开。
hogs.aov.b <- aov(hogs.fit ~Levelname, data = filter(hogs.sample, Zone == "B"))
hogs.aov.summary.b <- summary(hogs.aov.b)
hogs.tukey.b <- TukeyHSD(hogs.aov.b)
hogs.tukey.b
hogs.aov.d <- aov(hogs.fit ~ Levelname, data = filter(hogs.sample, Zone == "D"))
hogs.aov.summary.d <- summary(hogs.aov.d)
hogs.tukey.d <- TukeyHSD(hogs.aov.d)
hogs.tukey.d
我试过这条路线,但除了猪以外,我还有很多物种可以应用它。
Show statistically significant difference in a graph
我可以一次获取一个区域的字母,但我不确定如何将两个区域都添加到绘图中。他们也不正常。我从网页上修改了这段代码,但这些字母并没有很好地放置在栏的顶部。
library(agricolae)
library(tidyverse)
# get highest point overall
abs_max <- max(bass.dat.d$bass.fit)
# get the highest point for each class
maxs <- bass.dat.d %>%
group_by(Levelname) %>%
# I like to add a little bit to each value so it rests above
# the highest point. Using a percentage of the highest point
# overall makes this code a bit more general
summarise(bass.fit=max(mean(exp(bass.fit))))
# get Tukey HSD results
Tukey_test <- aov(bass.fit ~ Levelname, data=bass.dat.d) %>%
HSD.test("Levelname", group=TRUE) %>%
.$groups %>%
as_tibble(rownames="Levelname") %>%
rename("Letters_Tukey"="groups") %>%
select(-bass.fit) %>%
# and join it to the max values we calculated -- these are
# your new y-coordinates
left_join(maxs, by="Levelname")
也有很多这样的例子https://www.staringatr.com/3-the-grammar-of-graphics/bar-plots/3_tukeys/,但他们都只是手动添加文本。如果有代码可以获取 Tukey 输出并自动将重要性字母添加到图中,那就太好了。
谢谢
我不明白你的 data/analysis(例如,为什么你在 hogs.fit 上使用 exp()
,字母应该是什么?)所以我不确定是否这是正确的,但没有其他人回答,所以这是我最好的猜测:
正确示例:
## Source: Rosane Rech
## https://statdoe.com/cld-customisation/#adding-the-letters-indicating-significant-differences
## https://www.youtube.com/watch?v=Uyof3S1gx3M
library(tidyverse)
library(ggthemes)
library(multcompView)
# analysis of variance
anova <- aov(weight ~ feed, data = chickwts)
# Tukey's test
tukey <- TukeyHSD(anova)
# compact letter display
cld <- multcompLetters4(anova, tukey)
# table with factors and 3rd quantile
dt <- group_by(chickwts, feed) %>%
summarise(w=mean(weight), sd = sd(weight)) %>%
arrange(desc(w))
# extracting the compact letter display and adding to the Tk table
cld <- as.data.frame.list(cld$feed)
dt$cld <- cld$Letters
print(dt)
#> # A tibble: 6 × 4
#> feed w sd cld
#> <fct> <dbl> <dbl> <chr>
#> 1 sunflower 329. 48.8 a
#> 2 casein 324. 64.4 a
#> 3 meatmeal 277. 64.9 ab
#> 4 soybean 246. 54.1 b
#> 5 linseed 219. 52.2 bc
#> 6 horsebean 160. 38.6 c
ggplot(dt, aes(feed, w)) +
geom_bar(stat = "identity", aes(fill = w), show.legend = FALSE) +
geom_errorbar(aes(ymin = w-sd, ymax=w+sd), width = 0.2) +
labs(x = "Feed Type", y = "Average Weight Gain (g)") +
geom_text(aes(label = cld, y = w + sd), vjust = -0.5) +
ylim(0,410) +
theme_few()
我的 'best guess' 和你的数据:
hogs.sample <- structure(list(Zone = c("B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "D",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D", "D", "D"), Levelname = c("Medium", "High",
"Low", "Med.High", "Med.High", "Med.High", "Med.High", "Med.High",
"Med.High", "Medium", "Med.High", "Medium", "Med.High", "High",
"Medium", "High", "Low", "Med.High", "Low", "High", "Medium",
"Medium", "Med.High", "Low", "Low", "Med.High", "Low", "Low",
"High", "High", "Med.High", "High", "Med.High", "Med.High", "Medium",
"High", "Low", "Low", "Med.High", "Low"), hogs.fit = c(-0.122,
-0.871, -0.279, -0.446, 0.413, 0.011, 0.157, 0.131, 0.367, -0.23,
0.007, 0.05, 0.04, -0.184, -0.265, -1.071, -0.223, 0.255, -0.635,
-1.103, 0.008, -0.04, 0.831, 0.042, -0.005, -0.022, 0.692, 0.402,
0.615, 0.785, 0.758, 0.738, 0.512, 0.222, -0.424, 0.556, -0.128,
-0.495, 0.591, 0.923)), row.names = c(NA, -40L), groups = structure(list(
Zone = c("B", "D"), .rows = structure(list(1:20, 21:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
# anova
anova <- aov(hogs.fit ~ Levelname * Zone, data = hogs.sample)
# Tukey's test
tukey <- TukeyHSD(anova)
# compact letter display
cld <- multcompLetters4(anova, tukey)
# table with factors and 3rd quantile
dt <- hogs.sample %>%
group_by(Zone, Levelname) %>%
summarise(w=mean(exp(hogs.fit)), sd = sd(exp(hogs.fit)) / sqrt(n())) %>%
arrange(desc(w)) %>%
ungroup() %>%
mutate(Levelname = factor(Levelname,
levels = c("High",
"Med.High",
"Medium",
"Low"),
ordered = TRUE))
# extracting the compact letter display and adding to the Tk table
cld2 <- data.frame(letters = cld$`Levelname:Zone`$Letters)
dt$cld <- cld2$letters
print(dt)
#> # A tibble: 8 × 5
#> Zone Levelname w sd cld
#> <chr> <ord> <dbl> <dbl> <chr>
#> 1 D High 1.97 0.104 a
#> 2 D Med.High 1.69 0.206 ab
#> 3 D Low 1.36 0.258 abc
#> 4 B Med.High 1.14 0.0872 abc
#> 5 B Medium 0.875 0.0641 bcd
#> 6 D Medium 0.874 0.111 bcd
#> 7 B Low 0.696 0.0837 cd
#> 8 B High 0.481 0.118 d
ggplot(dt, aes(x = Levelname, y = w)) +
geom_bar(stat = "identity", aes(fill = Levelname), show.legend = FALSE) +
geom_errorbar(aes(ymin = w - sd, ymax = w + sd), width = 0.2) +
labs(x = "Levelname", y = "Average hogs.fit") +
geom_text(aes(label = cld, y = w + sd), vjust = -0.5) +
facet_wrap(~Zone)
由 reprex package (v2.0.1)
于 2021-10-01 创建
我想我可以扩展 jared_mamrot 的答案!
首先,您会发现一个准备好成为 copy-pasted 的 reprex。下面,我对此有一些补充意见。
代表
hogs.sample <- data.frame(
stringsAsFactors = FALSE,
Zone = c("B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","D","D","D","D","D","D","D","D","D",
"D","D","D","D","D","D","D","D","D","D","D"),
Levelname = c("Medium","High","Low",
"Med.High","Med.High","Med.High","Med.High","Med.High",
"Med.High","Medium","Med.High","Medium","Med.High",
"High","Medium","High","Low","Med.High","Low","High",
"Medium","Medium","Med.High","Low","Low","Med.High",
"Low","Low","High","High","Med.High","High",
"Med.High","Med.High","Medium","High","Low","Low",
"Med.High","Low"),
hogs.fit = c(-0.122,-0.871,-0.279,-0.446,
0.413,0.011,0.157,0.131,0.367,-0.23,0.007,0.05,
0.04,-0.184,-0.265,-1.071,-0.223,0.255,-0.635,
-1.103,0.008,-0.04,0.831,0.042,-0.005,-0.022,0.692,
0.402,0.615,0.785,0.758,0.738,0.512,0.222,-0.424,
0.556,-0.128,-0.495,0.591,0.923)
)
library(tidyverse)
# {emmeans}, {multcomp} & {multcompView} ----------------------------------
library(emmeans)
library(multcomp)
library(multcompView)
# set up model
model <- lm(exp(hogs.fit) ~ Levelname * Zone, data = hogs.sample)
# get (adjusted) means
model_means <- emmeans(object = model,
specs = ~ Levelname | Zone)
# add letters to each mean
model_means_cld <- cld(object = model_means,
adjust = "Tukey",
Letters = letters,
alpha = 0.05)
# show output
model_means_cld
#> Zone = B:
#> Levelname emmean SE df lower.CL upper.CL .group
#> High 0.481 0.199 32 -0.0445 1.01 a
#> Low 0.696 0.230 32 0.0884 1.30 ab
#> Medium 0.875 0.199 32 0.3488 1.40 ab
#> Med.High 1.139 0.133 32 0.7889 1.49 b
#>
#> Zone = D:
#> Levelname emmean SE df lower.CL upper.CL .group
#> Medium 0.874 0.230 32 0.2673 1.48 a
#> Low 1.362 0.151 32 0.9650 1.76 ab
#> Med.High 1.688 0.163 32 1.2592 2.12 b
#> High 1.969 0.199 32 1.4436 2.50 b
#>
#> Results are given on the exp (not the response) scale.
#> Confidence level used: 0.95
#> Conf-level adjustment: sidak method for 4 estimates
#> P value adjustment: tukey method for comparing a family of 4 estimates
# format output for ggplot
model_means_cld <- model_means_cld %>%
as.data.frame() %>%
mutate(Zone = case_when(
Zone == "B" ~ "Econfina",
Zone == "D" ~ "Steinhatchee"
))
# get ggplot
ggplot(data = model_means_cld,
aes(x = Levelname, y = emmean, fill = Levelname)) +
facet_grid(cols = vars(Zone)) +
geom_bar(stat = "identity", color = "black", show.legend = FALSE) +
geom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE), width = 0.2) +
scale_y_continuous(
name = "CPUE (adj. mean ± 1 std. error)",
expand = expansion(mult = c(0, 0.1)),
labels = scales::number_format(accuracy = 0.1)
) +
xlab(NULL) +
labs(title = "Hogs",
caption = "Separaetly per Zone, means followed by a common letter are not significantly different according to the Tukey-test") +
geom_text(aes(label = str_trim(.group), y = emmean + SE), vjust = -0.5) +
scale_fill_manual(values = c("midnightblue", "dodgerblue4", "steelblue3", 'lightskyblue')) +
theme_classic() +
theme(
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.background = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(),
axis.text.x = element_text(),
axis.title.x = element_text(vjust = 0),
axis.title.y = element_text(size = 8)
) +
theme(legend.title = element_blank(),
legend.position = "none")
由 reprex package (v2.0.1)
于 2021-10-18 创建
评论
- This chapter 关于使用紧凑型字母显示您可能会感兴趣。请注意,它特别说明了为什么我将标题放在 ggplot 下方。
- 此外,我相信 jared_mamrot 与您的要求相比改变了一件重要的事情。您可以选择将所有 8 种方法相互比较,或者将所有 4 个级别名称方法分别相互比较每个区域。从你展示的情节来看,你选择了第二个选项,我通过
emmeans()
中的 specs = ~ Levelname | Zone
复制了它。您可以选择选项 1 并通过将其更改为 specs = ~ Levelname * Zone
找到相同的字母 jared_mamrot。两个选项都有效,但结果不同,你必须选择你想要的。
- 最后请注意,如果您使用函数(如
emmeans()
)来计算和比较这些平均值,则无需单独计算每个因子水平(组合)的算术平均值。此外,在更复杂的情况下,例如缺少数据,你不应该显示那些简单的算术平均值和它们的标准误差,而是直接去估计边际平均值 a.k.a。最小二乘法表示 a.k.a。调整后的意思是 a.k.a。基于模型的手段。然而,在简单的情况下,它们与简单的算术平均值相同。
使用此数据...
hogs.sample<-structure(list(Zone = c("B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "D",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D", "D", "D"), Levelname = c("Medium", "High",
"Low", "Med.High", "Med.High", "Med.High", "Med.High", "Med.High",
"Med.High", "Medium", "Med.High", "Medium", "Med.High", "High",
"Medium", "High", "Low", "Med.High", "Low", "High", "Medium",
"Medium", "Med.High", "Low", "Low", "Med.High", "Low", "Low",
"High", "High", "Med.High", "High", "Med.High", "Med.High", "Medium",
"High", "Low", "Low", "Med.High", "Low"), hogs.fit = c(-0.122,
-0.871, -0.279, -0.446, 0.413, 0.011, 0.157, 0.131, 0.367, -0.23,
0.007, 0.05, 0.04, -0.184, -0.265, -1.071, -0.223, 0.255, -0.635,
-1.103, 0.008, -0.04, 0.831, 0.042, -0.005, -0.022, 0.692, 0.402,
0.615, 0.785, 0.758, 0.738, 0.512, 0.222, -0.424, 0.556, -0.128,
-0.495, 0.591, 0.923)), row.names = c(NA, -40L), groups = structure(list(
Zone = c("B", "D"), .rows = structure(list(1:20, 21:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
我正在尝试将基于 Tukeys HSD 的重要字母添加到下面的图中...
library(agricolae)
library(tidyverse)
hogs.plot <- ggplot(hogs.sample, aes(x = Zone, y = exp(hogs.fit),
fill = factor(Levelname, levels = c("High", "Med.High", "Medium", "Low")))) +
stat_summary(fun = mean, geom = "bar", position = position_dodge(0.9), color = "black") +
stat_summary(fun.data = mean_se, geom = "errorbar", position = position_dodge(0.9), width = 0.2) +
labs(x = "", y = "CPUE (+/-1SE)", legend = NULL) +
scale_y_continuous(expand = c(0,0), labels = scales::number_format(accuracy = 0.1)) +
scale_fill_manual(values = c("midnightblue", "dodgerblue4", "steelblue3", 'lightskyblue')) +
scale_x_discrete(breaks=c("B", "D"), labels=c("Econfina", "Steinhatchee"))+
scale_color_hue(l=40, c = 100)+
# coord_cartesian(ylim = c(0, 3.5)) +
labs(title = "Hogs", x = "", legend = NULL) +
theme(panel.border = element_blank(), panel.grid.major = element_blank(), panel.background = element_blank(),
panel.grid.minor = element_blank(), axis.line = element_line(),
axis.text.x = element_text(), axis.title.x = element_text(vjust = 0),
axis.title.y = element_text(size = 8))+
theme(legend.title = element_blank(),
legend.position = "none")
hogs.plot
我理想的输出应该是这样的...
我不确定这些字母在我的示例图中是否 100% 准确,但它们表示哪些组彼此之间存在显着差异。区域是独立的,所以我不想在两个区域之间进行任何比较,所以我 运行 使用以下代码将它们分开。
hogs.aov.b <- aov(hogs.fit ~Levelname, data = filter(hogs.sample, Zone == "B"))
hogs.aov.summary.b <- summary(hogs.aov.b)
hogs.tukey.b <- TukeyHSD(hogs.aov.b)
hogs.tukey.b
hogs.aov.d <- aov(hogs.fit ~ Levelname, data = filter(hogs.sample, Zone == "D"))
hogs.aov.summary.d <- summary(hogs.aov.d)
hogs.tukey.d <- TukeyHSD(hogs.aov.d)
hogs.tukey.d
我试过这条路线,但除了猪以外,我还有很多物种可以应用它。 Show statistically significant difference in a graph
我可以一次获取一个区域的字母,但我不确定如何将两个区域都添加到绘图中。他们也不正常。我从网页上修改了这段代码,但这些字母并没有很好地放置在栏的顶部。
library(agricolae)
library(tidyverse)
# get highest point overall
abs_max <- max(bass.dat.d$bass.fit)
# get the highest point for each class
maxs <- bass.dat.d %>%
group_by(Levelname) %>%
# I like to add a little bit to each value so it rests above
# the highest point. Using a percentage of the highest point
# overall makes this code a bit more general
summarise(bass.fit=max(mean(exp(bass.fit))))
# get Tukey HSD results
Tukey_test <- aov(bass.fit ~ Levelname, data=bass.dat.d) %>%
HSD.test("Levelname", group=TRUE) %>%
.$groups %>%
as_tibble(rownames="Levelname") %>%
rename("Letters_Tukey"="groups") %>%
select(-bass.fit) %>%
# and join it to the max values we calculated -- these are
# your new y-coordinates
left_join(maxs, by="Levelname")
也有很多这样的例子https://www.staringatr.com/3-the-grammar-of-graphics/bar-plots/3_tukeys/,但他们都只是手动添加文本。如果有代码可以获取 Tukey 输出并自动将重要性字母添加到图中,那就太好了。
谢谢
我不明白你的 data/analysis(例如,为什么你在 hogs.fit 上使用 exp()
,字母应该是什么?)所以我不确定是否这是正确的,但没有其他人回答,所以这是我最好的猜测:
正确示例:
## Source: Rosane Rech
## https://statdoe.com/cld-customisation/#adding-the-letters-indicating-significant-differences
## https://www.youtube.com/watch?v=Uyof3S1gx3M
library(tidyverse)
library(ggthemes)
library(multcompView)
# analysis of variance
anova <- aov(weight ~ feed, data = chickwts)
# Tukey's test
tukey <- TukeyHSD(anova)
# compact letter display
cld <- multcompLetters4(anova, tukey)
# table with factors and 3rd quantile
dt <- group_by(chickwts, feed) %>%
summarise(w=mean(weight), sd = sd(weight)) %>%
arrange(desc(w))
# extracting the compact letter display and adding to the Tk table
cld <- as.data.frame.list(cld$feed)
dt$cld <- cld$Letters
print(dt)
#> # A tibble: 6 × 4
#> feed w sd cld
#> <fct> <dbl> <dbl> <chr>
#> 1 sunflower 329. 48.8 a
#> 2 casein 324. 64.4 a
#> 3 meatmeal 277. 64.9 ab
#> 4 soybean 246. 54.1 b
#> 5 linseed 219. 52.2 bc
#> 6 horsebean 160. 38.6 c
ggplot(dt, aes(feed, w)) +
geom_bar(stat = "identity", aes(fill = w), show.legend = FALSE) +
geom_errorbar(aes(ymin = w-sd, ymax=w+sd), width = 0.2) +
labs(x = "Feed Type", y = "Average Weight Gain (g)") +
geom_text(aes(label = cld, y = w + sd), vjust = -0.5) +
ylim(0,410) +
theme_few()
我的 'best guess' 和你的数据:
hogs.sample <- structure(list(Zone = c("B", "B", "B", "B", "B", "B", "B", "B",
"B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "B", "D",
"D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D", "D",
"D", "D", "D", "D", "D", "D"), Levelname = c("Medium", "High",
"Low", "Med.High", "Med.High", "Med.High", "Med.High", "Med.High",
"Med.High", "Medium", "Med.High", "Medium", "Med.High", "High",
"Medium", "High", "Low", "Med.High", "Low", "High", "Medium",
"Medium", "Med.High", "Low", "Low", "Med.High", "Low", "Low",
"High", "High", "Med.High", "High", "Med.High", "Med.High", "Medium",
"High", "Low", "Low", "Med.High", "Low"), hogs.fit = c(-0.122,
-0.871, -0.279, -0.446, 0.413, 0.011, 0.157, 0.131, 0.367, -0.23,
0.007, 0.05, 0.04, -0.184, -0.265, -1.071, -0.223, 0.255, -0.635,
-1.103, 0.008, -0.04, 0.831, 0.042, -0.005, -0.022, 0.692, 0.402,
0.615, 0.785, 0.758, 0.738, 0.512, 0.222, -0.424, 0.556, -0.128,
-0.495, 0.591, 0.923)), row.names = c(NA, -40L), groups = structure(list(
Zone = c("B", "D"), .rows = structure(list(1:20, 21:40), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -2L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
# anova
anova <- aov(hogs.fit ~ Levelname * Zone, data = hogs.sample)
# Tukey's test
tukey <- TukeyHSD(anova)
# compact letter display
cld <- multcompLetters4(anova, tukey)
# table with factors and 3rd quantile
dt <- hogs.sample %>%
group_by(Zone, Levelname) %>%
summarise(w=mean(exp(hogs.fit)), sd = sd(exp(hogs.fit)) / sqrt(n())) %>%
arrange(desc(w)) %>%
ungroup() %>%
mutate(Levelname = factor(Levelname,
levels = c("High",
"Med.High",
"Medium",
"Low"),
ordered = TRUE))
# extracting the compact letter display and adding to the Tk table
cld2 <- data.frame(letters = cld$`Levelname:Zone`$Letters)
dt$cld <- cld2$letters
print(dt)
#> # A tibble: 8 × 5
#> Zone Levelname w sd cld
#> <chr> <ord> <dbl> <dbl> <chr>
#> 1 D High 1.97 0.104 a
#> 2 D Med.High 1.69 0.206 ab
#> 3 D Low 1.36 0.258 abc
#> 4 B Med.High 1.14 0.0872 abc
#> 5 B Medium 0.875 0.0641 bcd
#> 6 D Medium 0.874 0.111 bcd
#> 7 B Low 0.696 0.0837 cd
#> 8 B High 0.481 0.118 d
ggplot(dt, aes(x = Levelname, y = w)) +
geom_bar(stat = "identity", aes(fill = Levelname), show.legend = FALSE) +
geom_errorbar(aes(ymin = w - sd, ymax = w + sd), width = 0.2) +
labs(x = "Levelname", y = "Average hogs.fit") +
geom_text(aes(label = cld, y = w + sd), vjust = -0.5) +
facet_wrap(~Zone)
由 reprex package (v2.0.1)
于 2021-10-01 创建我想我可以扩展 jared_mamrot 的答案!
首先,您会发现一个准备好成为 copy-pasted 的 reprex。下面,我对此有一些补充意见。
代表
hogs.sample <- data.frame(
stringsAsFactors = FALSE,
Zone = c("B","B","B","B","B","B",
"B","B","B","B","B","B","B","B","B","B","B","B",
"B","B","D","D","D","D","D","D","D","D","D",
"D","D","D","D","D","D","D","D","D","D","D"),
Levelname = c("Medium","High","Low",
"Med.High","Med.High","Med.High","Med.High","Med.High",
"Med.High","Medium","Med.High","Medium","Med.High",
"High","Medium","High","Low","Med.High","Low","High",
"Medium","Medium","Med.High","Low","Low","Med.High",
"Low","Low","High","High","Med.High","High",
"Med.High","Med.High","Medium","High","Low","Low",
"Med.High","Low"),
hogs.fit = c(-0.122,-0.871,-0.279,-0.446,
0.413,0.011,0.157,0.131,0.367,-0.23,0.007,0.05,
0.04,-0.184,-0.265,-1.071,-0.223,0.255,-0.635,
-1.103,0.008,-0.04,0.831,0.042,-0.005,-0.022,0.692,
0.402,0.615,0.785,0.758,0.738,0.512,0.222,-0.424,
0.556,-0.128,-0.495,0.591,0.923)
)
library(tidyverse)
# {emmeans}, {multcomp} & {multcompView} ----------------------------------
library(emmeans)
library(multcomp)
library(multcompView)
# set up model
model <- lm(exp(hogs.fit) ~ Levelname * Zone, data = hogs.sample)
# get (adjusted) means
model_means <- emmeans(object = model,
specs = ~ Levelname | Zone)
# add letters to each mean
model_means_cld <- cld(object = model_means,
adjust = "Tukey",
Letters = letters,
alpha = 0.05)
# show output
model_means_cld
#> Zone = B:
#> Levelname emmean SE df lower.CL upper.CL .group
#> High 0.481 0.199 32 -0.0445 1.01 a
#> Low 0.696 0.230 32 0.0884 1.30 ab
#> Medium 0.875 0.199 32 0.3488 1.40 ab
#> Med.High 1.139 0.133 32 0.7889 1.49 b
#>
#> Zone = D:
#> Levelname emmean SE df lower.CL upper.CL .group
#> Medium 0.874 0.230 32 0.2673 1.48 a
#> Low 1.362 0.151 32 0.9650 1.76 ab
#> Med.High 1.688 0.163 32 1.2592 2.12 b
#> High 1.969 0.199 32 1.4436 2.50 b
#>
#> Results are given on the exp (not the response) scale.
#> Confidence level used: 0.95
#> Conf-level adjustment: sidak method for 4 estimates
#> P value adjustment: tukey method for comparing a family of 4 estimates
# format output for ggplot
model_means_cld <- model_means_cld %>%
as.data.frame() %>%
mutate(Zone = case_when(
Zone == "B" ~ "Econfina",
Zone == "D" ~ "Steinhatchee"
))
# get ggplot
ggplot(data = model_means_cld,
aes(x = Levelname, y = emmean, fill = Levelname)) +
facet_grid(cols = vars(Zone)) +
geom_bar(stat = "identity", color = "black", show.legend = FALSE) +
geom_errorbar(aes(ymin = emmean - SE, ymax = emmean + SE), width = 0.2) +
scale_y_continuous(
name = "CPUE (adj. mean ± 1 std. error)",
expand = expansion(mult = c(0, 0.1)),
labels = scales::number_format(accuracy = 0.1)
) +
xlab(NULL) +
labs(title = "Hogs",
caption = "Separaetly per Zone, means followed by a common letter are not significantly different according to the Tukey-test") +
geom_text(aes(label = str_trim(.group), y = emmean + SE), vjust = -0.5) +
scale_fill_manual(values = c("midnightblue", "dodgerblue4", "steelblue3", 'lightskyblue')) +
theme_classic() +
theme(
panel.border = element_blank(),
panel.grid.major = element_blank(),
panel.background = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(),
axis.text.x = element_text(),
axis.title.x = element_text(vjust = 0),
axis.title.y = element_text(size = 8)
) +
theme(legend.title = element_blank(),
legend.position = "none")
由 reprex package (v2.0.1)
于 2021-10-18 创建评论
- This chapter 关于使用紧凑型字母显示您可能会感兴趣。请注意,它特别说明了为什么我将标题放在 ggplot 下方。
- 此外,我相信 jared_mamrot 与您的要求相比改变了一件重要的事情。您可以选择将所有 8 种方法相互比较,或者将所有 4 个级别名称方法分别相互比较每个区域。从你展示的情节来看,你选择了第二个选项,我通过
emmeans()
中的specs = ~ Levelname | Zone
复制了它。您可以选择选项 1 并通过将其更改为specs = ~ Levelname * Zone
找到相同的字母 jared_mamrot。两个选项都有效,但结果不同,你必须选择你想要的。 - 最后请注意,如果您使用函数(如
emmeans()
)来计算和比较这些平均值,则无需单独计算每个因子水平(组合)的算术平均值。此外,在更复杂的情况下,例如缺少数据,你不应该显示那些简单的算术平均值和它们的标准误差,而是直接去估计边际平均值 a.k.a。最小二乘法表示 a.k.a。调整后的意思是 a.k.a。基于模型的手段。然而,在简单的情况下,它们与简单的算术平均值相同。