按 R 中的日期差异分组
Group by date difference in R
我想按日期差异在一个组内分组。
例如,如果设施 A 中有 7 个案例,但前 5 个案例发生在最后 2 个案例的 14 天之前,我希望它们属于两个不同的组(见下面的示例)
location
address
start_date
start_date_diff
Group
Facility A
123 main st
2/7/2022
0
1
Facility A
123 main st
2/11/2022
4
1
Facility A
123 main st
2/11/2022
0
1
Facility A
123 main st
2/11/2022
0
1
Facility A
123 main st
2/12/2022
1
1
Facility A
123 main st
3/12/2022
28
2
Facility A
123 main st
3/17/2022
5
2
Facility B
55 ford rd
3/16/2022
0
3
Facility B
55 ford rd
3/16/2022
0
3
Facility C
1 step ave
3/16/2022
0
4
Facility C
1 step ave
3/20/2022
4
4
Facility C
1 step ave
3/22/2022
2
4
到目前为止,这是我的代码:
我对如何根据个人观察之间的日期差异进一步分组感到困惑。
假设我们还没有 diff
计算,并且我们需要将 start_date
转换成在算术上有用的东西。
data.table
library(data.table)
as.data.table(dat)[, start_date := as.Date(start_date, format = "%m/%d/%Y")
][, diff14 := cumsum(c(0, diff(start_date)) > 14), by = location
][, Group2 := rleid(location, diff14)][]
# location address start_date start_date_diff Group diff14 Group2
# <char> <char> <Date> <int> <int> <int> <int>
# 1: Facility A 123 main st 2022-02-07 0 1 0 1
# 2: Facility A 123 main st 2022-02-11 4 1 0 1
# 3: Facility A 123 main st 2022-02-11 0 1 0 1
# 4: Facility A 123 main st 2022-02-11 0 1 0 1
# 5: Facility A 123 main st 2022-02-12 1 1 0 1
# 6: Facility A 123 main st 2022-03-12 28 2 1 2
# 7: Facility A 123 main st 2022-03-17 5 2 1 2
# 8: Facility B 55 ford rd 2022-03-16 0 3 0 3
# 9: Facility B 55 ford rd 2022-03-16 0 3 0 3
# 10: Facility C 1 step ave 2022-03-16 0 4 0 4
# 11: Facility C 1 step ave 2022-03-20 4 4 0 4
# 12: Facility C 1 step ave 2022-03-22 2 4 0 4
dplyr
library(dplyr)
dat %>%
mutate(start_date = as.Date(start_date, format = "%m/%d/%Y")) %>%
group_by(location) %>%
mutate(diff14 = cumsum(c(0, diff(start_date)) > 14)) %>%
group_by(location, diff14) %>%
mutate(Group2 = cur_group_id()) %>%
ungroup()
# # A tibble: 12 x 7
# location address start_date start_date_diff Group diff14 Group2
# <chr> <chr> <date> <int> <int> <int> <int>
# 1 Facility A 123 main st 2022-02-07 0 1 0 1
# 2 Facility A 123 main st 2022-02-11 4 1 0 1
# 3 Facility A 123 main st 2022-02-11 0 1 0 1
# 4 Facility A 123 main st 2022-02-11 0 1 0 1
# 5 Facility A 123 main st 2022-02-12 1 1 0 1
# 6 Facility A 123 main st 2022-03-12 28 2 1 2
# 7 Facility A 123 main st 2022-03-17 5 2 1 2
# 8 Facility B 55 ford rd 2022-03-16 0 3 0 3
# 9 Facility B 55 ford rd 2022-03-16 0 3 0 3
# 10 Facility C 1 step ave 2022-03-16 0 4 0 4
# 11 Facility C 1 step ave 2022-03-20 4 4 0 4
# 12 Facility C 1 step ave 2022-03-22 2 4 0 4
数据
read.md <- structure(list(location = c("Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility B", "Facility B", "Facility C", "Facility C", "Facility C"), address = c("123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "55 ford rd", "55 ford rd", "1 step ave", "1 step ave", "1 step ave"), start_date = c("2/7/2022", "2/11/2022", "2/11/2022", "2/11/2022", "2/12/2022", "3/12/2022", "3/17/2022", "3/16/2022", "3/16/2022", "3/16/2022", "3/20/2022", "3/22/2022"), start_date_diff = c(0L, 4L, 0L, 0L, 1L, 28L, 5L, 0L, 0L, 0L, 4L, 2L), Group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L)), class = "data.frame", row.names = c(NA, -12L))
我想按日期差异在一个组内分组。
例如,如果设施 A 中有 7 个案例,但前 5 个案例发生在最后 2 个案例的 14 天之前,我希望它们属于两个不同的组(见下面的示例)
location | address | start_date | start_date_diff | Group |
---|---|---|---|---|
Facility A | 123 main st | 2/7/2022 | 0 | 1 |
Facility A | 123 main st | 2/11/2022 | 4 | 1 |
Facility A | 123 main st | 2/11/2022 | 0 | 1 |
Facility A | 123 main st | 2/11/2022 | 0 | 1 |
Facility A | 123 main st | 2/12/2022 | 1 | 1 |
Facility A | 123 main st | 3/12/2022 | 28 | 2 |
Facility A | 123 main st | 3/17/2022 | 5 | 2 |
Facility B | 55 ford rd | 3/16/2022 | 0 | 3 |
Facility B | 55 ford rd | 3/16/2022 | 0 | 3 |
Facility C | 1 step ave | 3/16/2022 | 0 | 4 |
Facility C | 1 step ave | 3/20/2022 | 4 | 4 |
Facility C | 1 step ave | 3/22/2022 | 2 | 4 |
到目前为止,这是我的代码:
我对如何根据个人观察之间的日期差异进一步分组感到困惑。
假设我们还没有 diff
计算,并且我们需要将 start_date
转换成在算术上有用的东西。
data.table
library(data.table)
as.data.table(dat)[, start_date := as.Date(start_date, format = "%m/%d/%Y")
][, diff14 := cumsum(c(0, diff(start_date)) > 14), by = location
][, Group2 := rleid(location, diff14)][]
# location address start_date start_date_diff Group diff14 Group2
# <char> <char> <Date> <int> <int> <int> <int>
# 1: Facility A 123 main st 2022-02-07 0 1 0 1
# 2: Facility A 123 main st 2022-02-11 4 1 0 1
# 3: Facility A 123 main st 2022-02-11 0 1 0 1
# 4: Facility A 123 main st 2022-02-11 0 1 0 1
# 5: Facility A 123 main st 2022-02-12 1 1 0 1
# 6: Facility A 123 main st 2022-03-12 28 2 1 2
# 7: Facility A 123 main st 2022-03-17 5 2 1 2
# 8: Facility B 55 ford rd 2022-03-16 0 3 0 3
# 9: Facility B 55 ford rd 2022-03-16 0 3 0 3
# 10: Facility C 1 step ave 2022-03-16 0 4 0 4
# 11: Facility C 1 step ave 2022-03-20 4 4 0 4
# 12: Facility C 1 step ave 2022-03-22 2 4 0 4
dplyr
library(dplyr)
dat %>%
mutate(start_date = as.Date(start_date, format = "%m/%d/%Y")) %>%
group_by(location) %>%
mutate(diff14 = cumsum(c(0, diff(start_date)) > 14)) %>%
group_by(location, diff14) %>%
mutate(Group2 = cur_group_id()) %>%
ungroup()
# # A tibble: 12 x 7
# location address start_date start_date_diff Group diff14 Group2
# <chr> <chr> <date> <int> <int> <int> <int>
# 1 Facility A 123 main st 2022-02-07 0 1 0 1
# 2 Facility A 123 main st 2022-02-11 4 1 0 1
# 3 Facility A 123 main st 2022-02-11 0 1 0 1
# 4 Facility A 123 main st 2022-02-11 0 1 0 1
# 5 Facility A 123 main st 2022-02-12 1 1 0 1
# 6 Facility A 123 main st 2022-03-12 28 2 1 2
# 7 Facility A 123 main st 2022-03-17 5 2 1 2
# 8 Facility B 55 ford rd 2022-03-16 0 3 0 3
# 9 Facility B 55 ford rd 2022-03-16 0 3 0 3
# 10 Facility C 1 step ave 2022-03-16 0 4 0 4
# 11 Facility C 1 step ave 2022-03-20 4 4 0 4
# 12 Facility C 1 step ave 2022-03-22 2 4 0 4
数据
read.md <- structure(list(location = c("Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility A", "Facility B", "Facility B", "Facility C", "Facility C", "Facility C"), address = c("123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "123 main st", "55 ford rd", "55 ford rd", "1 step ave", "1 step ave", "1 step ave"), start_date = c("2/7/2022", "2/11/2022", "2/11/2022", "2/11/2022", "2/12/2022", "3/12/2022", "3/17/2022", "3/16/2022", "3/16/2022", "3/16/2022", "3/20/2022", "3/22/2022"), start_date_diff = c(0L, 4L, 0L, 0L, 1L, 28L, 5L, 0L, 0L, 0L, 4L, 2L), Group = c(1L, 1L, 1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 4L)), class = "data.frame", row.names = c(NA, -12L))