使用 group_by() 计算受试者首次测量的时间间隔
Calculate time interval to first measurement within subjects using group_by()
我在 'long format' 中有一个数据框,对象被多次观察:
dat1 <- tribble(
~CODE, ~V1, ~V2, ~session, ~date,
"1111P11", 2, 3, 1, "2020-09-01",
"1111P11", 3, 2, 2, "2020-09-08",
"1111P11", 1, 3, 3, "2020-09-15",
"1111P11", 3, 4, 4, "2020-09-25",
"2222P22", 5, 1, 1, "2020-05-15",
"2222P22", 3, 2, 2, "2020-05-22",
"2222P22", 1, 4, 3, "2020-05-30",
"3333P33", 3, 4, 1, "2020-06-10",
"3333P33", 4, 1, 2, "2020-06-17",
"3333P33", 3, 5, 3, "2020-06-24",
"3333P33", 4, 2, 4, "2020-07-01",
"3333P33", 3, 4, 5, "2020-07-10"
)
dat1$date <- date(dat$date)
我想为每个主题计算每个会话与第一个会话之间的时间间隔,结果应该是:
dat2 <- tribble(
~CODE, ~V1, ~V2, ~session, ~date, ~interv.1st.sess,
"1111P11", 2, 3, 1, "2020-09-01", 0,
"1111P11", 3, 2, 2, "2020-09-08", 7,
"1111P11", 1, 3, 3, "2020-09-15", 14,
"1111P11", 3, 4, 4, "2020-09-25", 24,
"2222P22", 5, 1, 1, "2020-05-15", 0,
"2222P22", 3, 2, 2, "2020-05-22", 7,
"2222P22", 1, 4, 3, "2020-05-30", 15,
"3333P33", 3, 4, 1, "2020-06-10", 0,
"3333P33", 4, 1, 2, "2020-06-17", 7,
"3333P33", 3, 5, 3, "2020-06-24", 14,
"3333P33", 4, 2, 4, "2020-07-01", 21,
"3333P33", 3, 4, 5, "2020-07-10", 30
)
我一直在尝试用 group_by()
以某种方式解决这个问题,但没有成功。有没有一种整洁的方式(或任何其他方式)来做到这一点?
试试 dplyr
和 lubridate
。
包含date1
明确转换日期格式
library(dplyr)
library(lubridate)
dat1 %>%
group_by(CODE) %>%
mutate(date1 = ymd(date),
diff = date1 - first(date1))
#> # A tibble: 12 x 7
#> # Groups: CODE [3]
#> CODE V1 V2 session date date1 diff
#> <chr> <dbl> <dbl> <dbl> <chr> <date> <drtn>
#> 1 1111P11 2 3 1 2020-09-01 2020-09-01 0 days
#> 2 1111P11 3 2 2 2020-09-08 2020-09-08 7 days
#> 3 1111P11 1 3 3 2020-09-15 2020-09-15 14 days
#> 4 1111P11 3 4 4 2020-09-25 2020-09-25 24 days
#> 5 2222P22 5 1 1 2020-05-15 2020-05-15 0 days
#> 6 2222P22 3 2 2 2020-05-22 2020-05-22 7 days
#> 7 2222P22 1 4 3 2020-05-30 2020-05-30 15 days
#> 8 3333P33 3 4 1 2020-06-10 2020-06-10 0 days
#> 9 3333P33 4 1 2 2020-06-17 2020-06-17 7 days
#> 10 3333P33 3 5 3 2020-06-24 2020-06-24 14 days
#> 11 3333P33 4 2 4 2020-07-01 2020-07-01 21 days
#> 12 3333P33 3 4 5 2020-07-10 2020-07-10 30 days
由 reprex package (v2.0.1)
于 2021-12-19 创建
这是使用 ave
-
的基础 R 选项
transform(dat1, diff_in_days = as.integer(date - ave(date, CODE,
FUN = function(x) x[1])))
# CODE V1 V2 session date diff_in_days
#1 1111P11 2 3 1 2020-09-01 0
#2 1111P11 3 2 2 2020-09-08 7
#3 1111P11 1 3 3 2020-09-15 14
#4 1111P11 3 4 4 2020-09-25 24
#5 2222P22 5 1 1 2020-05-15 0
#6 2222P22 3 2 2 2020-05-22 7
#7 2222P22 1 4 3 2020-05-30 15
#8 3333P33 3 4 1 2020-06-10 0
#9 3333P33 4 1 2 2020-06-17 7
#10 3333P33 3 5 3 2020-06-24 14
#11 3333P33 4 2 4 2020-07-01 21
#12 3333P33 3 4 5 2020-07-10 30
使用data.table
library(data.table)
setDT(dat1)[, diff := date - first(date), CODE]
-输出
> dat1
CODE V1 V2 session date diff
1: 1111P11 2 3 1 2020-09-01 0 days
2: 1111P11 3 2 2 2020-09-08 7 days
3: 1111P11 1 3 3 2020-09-15 14 days
4: 1111P11 3 4 4 2020-09-25 24 days
5: 2222P22 5 1 1 2020-05-15 0 days
6: 2222P22 3 2 2 2020-05-22 7 days
7: 2222P22 1 4 3 2020-05-30 15 days
8: 3333P33 3 4 1 2020-06-10 0 days
9: 3333P33 4 1 2 2020-06-17 7 days
10: 3333P33 3 5 3 2020-06-24 14 days
11: 3333P33 4 2 4 2020-07-01 21 days
12: 3333P33 3 4 5 2020-07-10 30 days
我在 'long format' 中有一个数据框,对象被多次观察:
dat1 <- tribble(
~CODE, ~V1, ~V2, ~session, ~date,
"1111P11", 2, 3, 1, "2020-09-01",
"1111P11", 3, 2, 2, "2020-09-08",
"1111P11", 1, 3, 3, "2020-09-15",
"1111P11", 3, 4, 4, "2020-09-25",
"2222P22", 5, 1, 1, "2020-05-15",
"2222P22", 3, 2, 2, "2020-05-22",
"2222P22", 1, 4, 3, "2020-05-30",
"3333P33", 3, 4, 1, "2020-06-10",
"3333P33", 4, 1, 2, "2020-06-17",
"3333P33", 3, 5, 3, "2020-06-24",
"3333P33", 4, 2, 4, "2020-07-01",
"3333P33", 3, 4, 5, "2020-07-10"
)
dat1$date <- date(dat$date)
我想为每个主题计算每个会话与第一个会话之间的时间间隔,结果应该是:
dat2 <- tribble(
~CODE, ~V1, ~V2, ~session, ~date, ~interv.1st.sess,
"1111P11", 2, 3, 1, "2020-09-01", 0,
"1111P11", 3, 2, 2, "2020-09-08", 7,
"1111P11", 1, 3, 3, "2020-09-15", 14,
"1111P11", 3, 4, 4, "2020-09-25", 24,
"2222P22", 5, 1, 1, "2020-05-15", 0,
"2222P22", 3, 2, 2, "2020-05-22", 7,
"2222P22", 1, 4, 3, "2020-05-30", 15,
"3333P33", 3, 4, 1, "2020-06-10", 0,
"3333P33", 4, 1, 2, "2020-06-17", 7,
"3333P33", 3, 5, 3, "2020-06-24", 14,
"3333P33", 4, 2, 4, "2020-07-01", 21,
"3333P33", 3, 4, 5, "2020-07-10", 30
)
我一直在尝试用 group_by()
以某种方式解决这个问题,但没有成功。有没有一种整洁的方式(或任何其他方式)来做到这一点?
试试 dplyr
和 lubridate
。
包含date1
明确转换日期格式
library(dplyr)
library(lubridate)
dat1 %>%
group_by(CODE) %>%
mutate(date1 = ymd(date),
diff = date1 - first(date1))
#> # A tibble: 12 x 7
#> # Groups: CODE [3]
#> CODE V1 V2 session date date1 diff
#> <chr> <dbl> <dbl> <dbl> <chr> <date> <drtn>
#> 1 1111P11 2 3 1 2020-09-01 2020-09-01 0 days
#> 2 1111P11 3 2 2 2020-09-08 2020-09-08 7 days
#> 3 1111P11 1 3 3 2020-09-15 2020-09-15 14 days
#> 4 1111P11 3 4 4 2020-09-25 2020-09-25 24 days
#> 5 2222P22 5 1 1 2020-05-15 2020-05-15 0 days
#> 6 2222P22 3 2 2 2020-05-22 2020-05-22 7 days
#> 7 2222P22 1 4 3 2020-05-30 2020-05-30 15 days
#> 8 3333P33 3 4 1 2020-06-10 2020-06-10 0 days
#> 9 3333P33 4 1 2 2020-06-17 2020-06-17 7 days
#> 10 3333P33 3 5 3 2020-06-24 2020-06-24 14 days
#> 11 3333P33 4 2 4 2020-07-01 2020-07-01 21 days
#> 12 3333P33 3 4 5 2020-07-10 2020-07-10 30 days
由 reprex package (v2.0.1)
于 2021-12-19 创建这是使用 ave
-
transform(dat1, diff_in_days = as.integer(date - ave(date, CODE,
FUN = function(x) x[1])))
# CODE V1 V2 session date diff_in_days
#1 1111P11 2 3 1 2020-09-01 0
#2 1111P11 3 2 2 2020-09-08 7
#3 1111P11 1 3 3 2020-09-15 14
#4 1111P11 3 4 4 2020-09-25 24
#5 2222P22 5 1 1 2020-05-15 0
#6 2222P22 3 2 2 2020-05-22 7
#7 2222P22 1 4 3 2020-05-30 15
#8 3333P33 3 4 1 2020-06-10 0
#9 3333P33 4 1 2 2020-06-17 7
#10 3333P33 3 5 3 2020-06-24 14
#11 3333P33 4 2 4 2020-07-01 21
#12 3333P33 3 4 5 2020-07-10 30
使用data.table
library(data.table)
setDT(dat1)[, diff := date - first(date), CODE]
-输出
> dat1
CODE V1 V2 session date diff
1: 1111P11 2 3 1 2020-09-01 0 days
2: 1111P11 3 2 2 2020-09-08 7 days
3: 1111P11 1 3 3 2020-09-15 14 days
4: 1111P11 3 4 4 2020-09-25 24 days
5: 2222P22 5 1 1 2020-05-15 0 days
6: 2222P22 3 2 2 2020-05-22 7 days
7: 2222P22 1 4 3 2020-05-30 15 days
8: 3333P33 3 4 1 2020-06-10 0 days
9: 3333P33 4 1 2 2020-06-17 7 days
10: 3333P33 3 5 3 2020-06-24 14 days
11: 3333P33 4 2 4 2020-07-01 21 days
12: 3333P33 3 4 5 2020-07-10 30 days