如何按组查找 R 中事件同时发生的日期?
How can I find the dates that events are happening concurrently in R by group?
在 R 中,我需要找到同时进行的治疗,并计算出当天的剂量是多少。我需要耐心地做这件事,所以大概在 dplyr
.
中使用 group_by 语句
user_id
treatment
dosage
treatment_start
treatment_end
1
1
3
01/28/2019
07/30/2019
1
1
2
05/26/2019
11/25/2019
1
2
1
08/13/2019
02/12/2020
1
1
2
12/06/2019
04/07/2020
1
2
1
12/09/2019
06/10/2020
理想情况下,它的最终形式将是用户 ID、他们接受的治疗、所有治疗的剂量总和以及他们接受所有这些治疗的日期。我已经制作了一个示例结果 table,下面有几行。
user_id
treatments
total_dosage
treatment_start
treatment_end
1
1
3
01/28/2019
05/25/2019
1
1
5
05/26/2019
07/30/2019
1
1
2
07/31/2019
08/12/2019
1
1,2
3
08/13/2019
11/25/2019
我想出了如何查找一个事件是否与其他事件重叠,但它没有得到结果日期,也没有对剂量求和,所以我不知道它是否可用。在这种情况下,当然是治疗和剂量列的组合。
DF %>% group_by(user_id ) %>%
mutate(overlap = purrr::map2_chr(treatment_start, treatment_end,
~toString(course[.x >= treatment_start & .x < treatment_end| .y > treatment_start & .y < treatment_end]))) %>%
ungroup()
这是一个有趣的问题。一种方法是将数据框扩展为每天一行,然后按日期汇总数据:
library(tidyverse)
library(lubridate)
dat %>%
# Convert dates to date format
mutate(across(treatment_start:treatment_end, ~ mdy(.x))) %>%
# Expand the dataframe
group_by(user_id, treatment_start, treatment_end) %>%
mutate(date = list(seq(treatment_start, treatment_end, by = "day"))) %>%
unnest(date) %>%
# Summarise by day
group_by(user_id, date) %>%
summarise(dosage = sum(dosage),
treatment = toString(unique(treatment))) %>%
# Summarise by different dosage (and create periods)
group_by(user_id, treatment, dosage) %>%
summarise(treatment_start = min(date),
treatment_ends = max(date)) %>%
arrange(treatment_start)
输出:
user_id treatment dosage treatment_start treatment_ends
<int> <chr> <int> <date> <date>
1 1 1 3 2019-01-28 2019-05-25
2 1 1 5 2019-05-26 2019-07-30
3 1 1 2 2019-07-31 2019-08-12
4 1 1, 2 3 2019-08-13 2020-04-07
5 1 2 1 2019-11-26 2020-06-10
6 1 2, 1 3 2019-12-06 2019-12-08
7 1 2, 1 4 2019-12-09 2020-02-12
在 R 中,我需要找到同时进行的治疗,并计算出当天的剂量是多少。我需要耐心地做这件事,所以大概在 dplyr
.
user_id | treatment | dosage | treatment_start | treatment_end |
---|---|---|---|---|
1 | 1 | 3 | 01/28/2019 | 07/30/2019 |
1 | 1 | 2 | 05/26/2019 | 11/25/2019 |
1 | 2 | 1 | 08/13/2019 | 02/12/2020 |
1 | 1 | 2 | 12/06/2019 | 04/07/2020 |
1 | 2 | 1 | 12/09/2019 | 06/10/2020 |
理想情况下,它的最终形式将是用户 ID、他们接受的治疗、所有治疗的剂量总和以及他们接受所有这些治疗的日期。我已经制作了一个示例结果 table,下面有几行。
user_id | treatments | total_dosage | treatment_start | treatment_end |
---|---|---|---|---|
1 | 1 | 3 | 01/28/2019 | 05/25/2019 |
1 | 1 | 5 | 05/26/2019 | 07/30/2019 |
1 | 1 | 2 | 07/31/2019 | 08/12/2019 |
1 | 1,2 | 3 | 08/13/2019 | 11/25/2019 |
我想出了如何查找一个事件是否与其他事件重叠,但它没有得到结果日期,也没有对剂量求和,所以我不知道它是否可用。在这种情况下,当然是治疗和剂量列的组合。
DF %>% group_by(user_id ) %>%
mutate(overlap = purrr::map2_chr(treatment_start, treatment_end,
~toString(course[.x >= treatment_start & .x < treatment_end| .y > treatment_start & .y < treatment_end]))) %>%
ungroup()
这是一个有趣的问题。一种方法是将数据框扩展为每天一行,然后按日期汇总数据:
library(tidyverse)
library(lubridate)
dat %>%
# Convert dates to date format
mutate(across(treatment_start:treatment_end, ~ mdy(.x))) %>%
# Expand the dataframe
group_by(user_id, treatment_start, treatment_end) %>%
mutate(date = list(seq(treatment_start, treatment_end, by = "day"))) %>%
unnest(date) %>%
# Summarise by day
group_by(user_id, date) %>%
summarise(dosage = sum(dosage),
treatment = toString(unique(treatment))) %>%
# Summarise by different dosage (and create periods)
group_by(user_id, treatment, dosage) %>%
summarise(treatment_start = min(date),
treatment_ends = max(date)) %>%
arrange(treatment_start)
输出:
user_id treatment dosage treatment_start treatment_ends
<int> <chr> <int> <date> <date>
1 1 1 3 2019-01-28 2019-05-25
2 1 1 5 2019-05-26 2019-07-30
3 1 1 2 2019-07-31 2019-08-12
4 1 1, 2 3 2019-08-13 2020-04-07
5 1 2 1 2019-11-26 2020-06-10
6 1 2, 1 3 2019-12-06 2019-12-08
7 1 2, 1 4 2019-12-09 2020-02-12