重复测量的描述性统计和箱线图?

Descriptive statistics and boxplot for repeated measurements?


更新:问题是由错字引起的


对于一项作业,我有以下小型交叉研究的数据,其中比较了两种药物 A 和 B 对舒张压 (DBP) 的影响。研究中的每位患者以随机顺序接受两种治疗并及时分开(“wash-out”期间),以便一种治疗不会影响在进行另一种治疗后获得的血压测量值(即排除carry-over效果)。数据如下所示:

library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)

mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7, 
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15, 
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L, 
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"), 
    drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L, 
    2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 
    1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A", 
    "Drug B"), class = "factor"), diastolic_bp = c(100, 112, 
    116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96, 
    103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128, 
    133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA, 
-36L), class = "data.frame")

我的第一个问题是关于获得每个时间品脱每个治疗组的平均值和标准差(以及平均值+百分位数)。我的代码:

mydata %>% 
  group_by(timepoint, drug) %>% 
  summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE), 
            sd_dbp=sd(diastolic_bp, na.rm=TRUE), 
            median_dbp=(diastolic_bp), 
            p25_dbp=quantile(diastolic_bp, probs=0.25), 
            p75_dbp=quantile(diastolic_bp, probs=0.75))

# This returns a line per patient:
# A tibble: 36 x 7
# Groups:   timepoint, drug [4]
   timepoint   drug   mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
   <fct>       <fct>     <dbl>  <dbl>      <dbl>   <dbl>   <dbl>
 1 Timepoint 1 Drug A     105.  14.1         100     96     108 
 2 Timepoint 1 Drug A     105.  14.1         108     96     108 
 3 Timepoint 1 Drug A     105.  14.1          98     96     108 
 4 Timepoint 1 Drug A     105.  14.1          96     96     108 
 5 Timepoint 1 Drug A     105.  14.1          92     96     108 
 6 Timepoint 1 Drug A     105.  14.1         127     96     108 
 7 Timepoint 1 Drug A     105.  14.1         129     96     108 
 8 Timepoint 1 Drug A     105.  14.1         106     96     108 
 9 Timepoint 1 Drug A     105.  14.1          91     96     108 
10 Timepoint 1 Drug B     114.   9.64        116    110.    116.
# ... with 26 more rows

但这会为数据集中的每一行生成计算。我所期待的是 drugtimepoint...

的每个组合的第一个数字

然后我尝试按时间点和组制作箱线图如下:

ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()

但这不包括分组变量drug

有什么帮助吗?

也许这就是你想要的。 drug 需要输入 aes。

ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()