重复测量的描述性统计和箱线图?
Descriptive statistics and boxplot for repeated measurements?
更新:问题是由错字引起的
- 问题1:
summarize
由于第三行错别字而没有按组输出(median_dbp=(diastolic_bp)
应该是median_dbp=median(diastolic_bp)
)。
- 问题 2:箱线图没有按
drug
分组,因为对 fill=drug
的调用在 aes
映射之外,但它应该在 aes
映射内部(正确代码:ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug))
.
对于一项作业,我有以下小型交叉研究的数据,其中比较了两种药物 A 和 B 对舒张压 (DBP) 的影响。研究中的每位患者以随机顺序接受两种治疗并及时分开(“wash-out”期间),以便一种治疗不会影响在进行另一种治疗后获得的血压测量值(即排除carry-over效果)。数据如下所示:
library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)
mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7,
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15,
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"),
drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A",
"Drug B"), class = "factor"), diastolic_bp = c(100, 112,
116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96,
103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128,
133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA,
-36L), class = "data.frame")
我的第一个问题是关于获得每个时间品脱每个治疗组的平均值和标准差(以及平均值+百分位数)。我的代码:
mydata %>%
group_by(timepoint, drug) %>%
summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE),
sd_dbp=sd(diastolic_bp, na.rm=TRUE),
median_dbp=(diastolic_bp),
p25_dbp=quantile(diastolic_bp, probs=0.25),
p75_dbp=quantile(diastolic_bp, probs=0.75))
# This returns a line per patient:
# A tibble: 36 x 7
# Groups: timepoint, drug [4]
timepoint drug mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Timepoint 1 Drug A 105. 14.1 100 96 108
2 Timepoint 1 Drug A 105. 14.1 108 96 108
3 Timepoint 1 Drug A 105. 14.1 98 96 108
4 Timepoint 1 Drug A 105. 14.1 96 96 108
5 Timepoint 1 Drug A 105. 14.1 92 96 108
6 Timepoint 1 Drug A 105. 14.1 127 96 108
7 Timepoint 1 Drug A 105. 14.1 129 96 108
8 Timepoint 1 Drug A 105. 14.1 106 96 108
9 Timepoint 1 Drug A 105. 14.1 91 96 108
10 Timepoint 1 Drug B 114. 9.64 116 110. 116.
# ... with 26 more rows
但这会为数据集中的每一行生成计算。我所期待的是 drug
和 timepoint
...
的每个组合的第一个数字
然后我尝试按时间点和组制作箱线图如下:
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()
但这不包括分组变量drug
:
有什么帮助吗?
也许这就是你想要的。 drug
需要输入 aes。
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()
更新:问题是由错字引起的
- 问题1:
summarize
由于第三行错别字而没有按组输出(median_dbp=(diastolic_bp)
应该是median_dbp=median(diastolic_bp)
)。 - 问题 2:箱线图没有按
drug
分组,因为对fill=drug
的调用在aes
映射之外,但它应该在aes
映射内部(正确代码:ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug))
.
对于一项作业,我有以下小型交叉研究的数据,其中比较了两种药物 A 和 B 对舒张压 (DBP) 的影响。研究中的每位患者以随机顺序接受两种治疗并及时分开(“wash-out”期间),以便一种治疗不会影响在进行另一种治疗后获得的血压测量值(即排除carry-over效果)。数据如下所示:
library(tidyverse)
library(dplyr)
library(lubridate)
library(magrittr)
mydata <- structure(list(pt_id = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 6, 6, 7,
7, 8, 8, 9, 9, 10, 10, 11, 11, 12, 12, 13, 13, 14, 14, 15, 15,
16, 17, 17, 18, 18, 19, 19), timepoint = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L), .Label = c("Timepoint 1", "Timepoint 2"), class = "factor"),
drug = structure(c(1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 1L,
2L, 2L, 1L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 2L,
1L, 2L, 2L, 1L, 2L, 1L, 2L, 2L, 1L, 2L, 1L), .Label = c("Drug A",
"Drug B"), class = "factor"), diastolic_bp = c(100, 112,
116, 114, 108, 110, 104, 114, 114, 98, 116, 102, 100, 96,
103, 92, 89, 103, 96, 116, 78, 127, 131, 129, 124, 106, 128,
133, 118, 108, 91, 109, 113, 98, 118, 112)), row.names = c(NA,
-36L), class = "data.frame")
我的第一个问题是关于获得每个时间品脱每个治疗组的平均值和标准差(以及平均值+百分位数)。我的代码:
mydata %>%
group_by(timepoint, drug) %>%
summarise(mean_dbp=mean(diastolic_bp, na.rm=TRUE),
sd_dbp=sd(diastolic_bp, na.rm=TRUE),
median_dbp=(diastolic_bp),
p25_dbp=quantile(diastolic_bp, probs=0.25),
p75_dbp=quantile(diastolic_bp, probs=0.75))
# This returns a line per patient:
# A tibble: 36 x 7
# Groups: timepoint, drug [4]
timepoint drug mean_dbp sd_dbp median_dbp p25_dbp p75_dbp
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Timepoint 1 Drug A 105. 14.1 100 96 108
2 Timepoint 1 Drug A 105. 14.1 108 96 108
3 Timepoint 1 Drug A 105. 14.1 98 96 108
4 Timepoint 1 Drug A 105. 14.1 96 96 108
5 Timepoint 1 Drug A 105. 14.1 92 96 108
6 Timepoint 1 Drug A 105. 14.1 127 96 108
7 Timepoint 1 Drug A 105. 14.1 129 96 108
8 Timepoint 1 Drug A 105. 14.1 106 96 108
9 Timepoint 1 Drug A 105. 14.1 91 96 108
10 Timepoint 1 Drug B 114. 9.64 116 110. 116.
# ... with 26 more rows
但这会为数据集中的每一行生成计算。我所期待的是 drug
和 timepoint
...
然后我尝试按时间点和组制作箱线图如下:
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp), fill=drug) + geom_boxplot()
但这不包括分组变量drug
:
有什么帮助吗?
也许这就是你想要的。 drug
需要输入 aes。
ggplot(data=mydata, aes(x=timepoint, y=diastolic_bp, fill=drug)) + geom_boxplot()