按id和药物分组(日期<100天彼此)取最早和最晚的日期
Group by id and drug (with dates <100 days of each other) take the earliest and latest date
这是我的数据集:
mydata = data.frame (Id =c(1,1,1,1,1,1,1,1,1,1),
Date = c("2000-01-01","2000-01-05","2000-02-02", "2000-02-12",
"2000-02-14","2000-05-13", "2000-05-15", "2000-05-17",
"2000-05-16", "2000-05-20"),
drug = c("A","A","B","B","B","A","A","A","C","C"))
下面的代码告诉我按 ID 和药物分组的给药日期之间的区别。如您所见,对于药物 A,给药日期之间存在 >100 天的间隔。
mydata <- mydata %>% group_by(Id, drug) %>% mutate(Diff = difftime(Date, lag(Date), units = 'days'))
任务是按 id 和药物分组,并获取每种药物的最早和最晚给药日期,但如果同一类型药物之间的日期间隔 >100 天,则需要它拥有最早和最晚的日期行。
下面的代码允许我获取最早和最晚的日期,但我不确定如何在此处添加 100 天的间隔。
mydata %>% group_by(Id, drug) %>%
summarise(startDate = min(as.Date(Date),na.rm = T),
endDate = max(as.Date(Date),na.rm = T))
下面是我希望得到的输出
mydata1 = data.frame (Id =c(1,1,1,1),
startDate = c("2000-01-01","2000-02-02","2000-05-13", "2000-05-16"),
endDate = c("2000-01-05", "2000-02-14", "2000-05-17", "2000-05-20"),
drug = c("A","B","A","C"))
如您所见,对于药物 A,有两行分别代表第一个开始日期和结束日期,然后是给药日期之间超过 100 天后的第二个开始日期和结束日期。
任何帮助将不胜感激!谢谢
您可以使用 cumsum
创建一个新分组:
library(dplyr)
mydata %>%
group_by(Id, drug) %>%
mutate(Diff = difftime(Date, lag(Date), units = 'days')) %>%
group_by(Id, drug, grp = cumsum(coalesce(Diff, as.difftime(0, units = 'days')) > 100)) %>%
summarise(startDate = min(as.Date(Date),na.rm = T),
endDate = max(as.Date(Date),na.rm = T),
.groups = "drop") %>%
select(-grp)
这个returns
# A tibble: 4 x 4
Id drug startDate endDate
<dbl> <chr> <date> <date>
1 1 A 2000-01-01 2000-01-05
2 1 A 2000-05-13 2000-05-17
3 1 B 2000-02-02 2000-02-14
4 1 C 2000-05-16 2000-05-20
这是我的数据集:
mydata = data.frame (Id =c(1,1,1,1,1,1,1,1,1,1),
Date = c("2000-01-01","2000-01-05","2000-02-02", "2000-02-12",
"2000-02-14","2000-05-13", "2000-05-15", "2000-05-17",
"2000-05-16", "2000-05-20"),
drug = c("A","A","B","B","B","A","A","A","C","C"))
下面的代码告诉我按 ID 和药物分组的给药日期之间的区别。如您所见,对于药物 A,给药日期之间存在 >100 天的间隔。
mydata <- mydata %>% group_by(Id, drug) %>% mutate(Diff = difftime(Date, lag(Date), units = 'days'))
任务是按 id 和药物分组,并获取每种药物的最早和最晚给药日期,但如果同一类型药物之间的日期间隔 >100 天,则需要它拥有最早和最晚的日期行。
下面的代码允许我获取最早和最晚的日期,但我不确定如何在此处添加 100 天的间隔。
mydata %>% group_by(Id, drug) %>%
summarise(startDate = min(as.Date(Date),na.rm = T),
endDate = max(as.Date(Date),na.rm = T))
下面是我希望得到的输出
mydata1 = data.frame (Id =c(1,1,1,1),
startDate = c("2000-01-01","2000-02-02","2000-05-13", "2000-05-16"),
endDate = c("2000-01-05", "2000-02-14", "2000-05-17", "2000-05-20"),
drug = c("A","B","A","C"))
如您所见,对于药物 A,有两行分别代表第一个开始日期和结束日期,然后是给药日期之间超过 100 天后的第二个开始日期和结束日期。
任何帮助将不胜感激!谢谢
您可以使用 cumsum
创建一个新分组:
library(dplyr)
mydata %>%
group_by(Id, drug) %>%
mutate(Diff = difftime(Date, lag(Date), units = 'days')) %>%
group_by(Id, drug, grp = cumsum(coalesce(Diff, as.difftime(0, units = 'days')) > 100)) %>%
summarise(startDate = min(as.Date(Date),na.rm = T),
endDate = max(as.Date(Date),na.rm = T),
.groups = "drop") %>%
select(-grp)
这个returns
# A tibble: 4 x 4
Id drug startDate endDate
<dbl> <chr> <date> <date>
1 1 A 2000-01-01 2000-01-05
2 1 A 2000-05-13 2000-05-17
3 1 B 2000-02-02 2000-02-14
4 1 C 2000-05-16 2000-05-20