
Descriptive statistics and boxplot for repeated measurements?


对于一项作业,我有以下小型交叉研究的数据,其中比较了两种药物 A 和 B 对舒张压 (DBP) 的影响。研究中的每位患者以随机顺序接受两种治疗并及时分开(“wash-out”期间),以便一种治疗不会影响在进行另一种治疗后获得的血压测量值(即排除carry-over效果)。数据如下所示:


mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7, 
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"), 
    drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
    2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
    1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A", 
    "Drug B"), class = "factor"), diastolic_bp = c(100, 112, 
    116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96, 
    103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128, 
    133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA, 
-36L), class = "data.frame")


mydata %>% 
  group_by(timepoint, drug) %>% 
  summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE), 
            sd_dbp=sd(diastolic_bp, na.rm=TRUE), 
            p25_dbp=quantile(diastolic_bp, probs=0.25), 
            p75_dbp=quantile(diastolic_bp, probs=0.75))

# This returns a line per patient:
# A tibble: 36 x 7
# Groups:   timepoint, drug [4]
   timepoint   drug   mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
   <fct>       <fct>     <dbl>  <dbl>      <dbl>   <dbl>   <dbl>
 1 Timepoint 1 Drug A     105.  14.1         100     96     108 
 2 Timepoint 1 Drug A     105.  14.1         108     96     108 
 3 Timepoint 1 Drug A     105.  14.1          98     96     108 
 4 Timepoint 1 Drug A     105.  14.1          96     96     108 
 5 Timepoint 1 Drug A     105.  14.1          92     96     108 
 6 Timepoint 1 Drug A     105.  14.1         127     96     108 
 7 Timepoint 1 Drug A     105.  14.1         129     96     108 
 8 Timepoint 1 Drug A     105.  14.1         106     96     108 
 9 Timepoint 1 Drug A     105.  14.1          91     96     108 
10 Timepoint 1 Drug B     114.   9.64        116    110.    116.
# ... with 26 more rows

但这会为数据集中的每一行生成计算。我所期待的是 drugtimepoint...



ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()



也许这就是你想要的。 drug 需要输入 aes。

ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()