按组比较几个变量之间的平均值

Comparing mean values by group between several variables

我正在尝试从 R 中的 Stata 重现图表。我有几个变量,想在每个有两个治疗组中显示它们的平均值。 Stata图如下:

这个系数图实际上不是系数图,而是每个单独变量的每个处理的平均值图。 df 基本上看起来像。

workable data

没有可重复的数据很难回答你的问题。


然而,这可能会得到你想要的结果:

library(dplyr)
mpg %>% 
  select(manufacturer, cty, trans) %>% 
  group_by(manufacturer, trans) %>% 
  summarize(cty_mean = mean(cty)) %>% 
  ggplot(aes(x=cty_mean, y=reorder(manufacturer, cty_mean), color=trans)) +
  geom_point() 

如果您还希望包含系数或标准误差,那么您可以通过在 summarize().

中包含一个函数来实现

我发现 geom_pointrange() 可能是您要查找的内容:

library("ggplot2")
set.seed(111018)
interval1 <- -qnorm((1-0.9)/2)  

means_treatment_1 <- rnorm(2)
se_treatment_1 <- rnorm(2)

df_treatment_1 <- data.frame("Mean" = means_treatment_1,
                         "lower" = means_treatment_1 - se_treatment_1*interval1,
                         "upper" = means_treatment_1 + se_treatment_1*interval1,
                         "Variable" = c("medicare_spending_dummy", 
                                        "job_training_dummy"),
                         "Treatment" = "a")


means_treatment_2 <- rnorm(2)
se_treatment_2 <- rnorm(2)

df_treatment_2 <- data.frame("Mean" = means_treatment_2,
                         "lower" = means_treatment_2 - se_treatment_2*interval1,
                         "upper" = means_treatment_2 + se_treatment_2*interval1,
                         "Variable" = c("medicare_spending_dummy", 
                                        "job_training_dummy"),
                         "Treatment" = "b")



df_tot<-rbind(df_treatment_1, df_treatment_2)



# Plot


ggplot(df_tot, aes(colour = Treatment)) +
geom_hline(yintercept = 0, colour = gray(1/2), lty = 2) +
geom_pointrange(aes(x = Variable, y = Mean, ymin = lower, ymax = upper ),lwd = 1, position = position_dodge(width = 1/2)) +

coord_flip() + 
theme_bw()