如何根据开始日期和结束日期计算季度内的观察次数?
How to count the number of observation that fall in quarter based on start and end date?
patient_id dt_diag_init enroll
<int64> <date> <date>
1 10401 2018-01-04 2020-09-30
2 60701 2019-05-31 2019-09-30
3 343702 2018-12-05 2020-09-30
4 472202 2019-12-30 2020-09-30
5 489502 2019-09-17 2019-11-30
6 557401 2019-10-15 2020-09-30
7 857901 2018-01-02 2020-09-30
8 874201 2018-01-01 2020-09-30
9 1309102 2019-03-11 2020-09-30
10 1317601 2018-08-14 2020-09-30
我正在尝试根据日期范围(dt_diag_init 和注册)统计每季度的患者人数。 patient_id 10401 从 2018-01-04 开始到 2020-09-30,因此由于日期范围重叠,患者将被计入 Q1'18、Q2'18 .... 到 Q3'20。我正在尝试获得仅包含季度和每季度患者人数的输出 table。
样本输出table
Qtr year total
<chr> <dbl> <dbl>
1 Q1'18 2018 485
2 Q2'18 2018 516
3 Q3'18 2018 560
这就是我在进行计数之前尝试创建带有四分之一列的 table 的内容,但出现错误:
df_1 <- df %>%
mutate(quarter = map2(
as.numeric(dt_diag_init),
as.numeric(enroll),
~ format(seq(.x, .y, by="quarter"), "Q%q'%y")
))
采样日期:
df <- structure(list(patient_id = structure(c(5.13877678239481e-320,
2.99902787682095e-319, 1.69811350606928e-318, 2.33298786097528e-318,
2.41846121770582e-318, 2.75392685057557e-318, 4.23859411632851e-318,
4.31912681660064e-318, 6.46782325102068e-318, 6.50981389026072e-318
), class = "integer64"), dt_diag_init = structure(c(17535, 18047,
17870, 18260, 18156, 18184, 17533, 17532, 17966, 17757), class = "Date"),
enroll = structure(c(18535, 18169, 18535, 18535, 18230, 18535,
18535, 18535, 18535, 18535), class = "Date")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
也许这有帮助
library(dplyr)
library(zoo)
library(tidyr)
library(purrr)
library(lubridate)
df %>%
mutate(quarter = map2(dt_diag_init, enroll,
~ as.yearqtr(seq(.x, .y, by = 'quarter')))) %>%
unnest(quarter) %>%
mutate(year = year(quarter)) %>%
count(quarter, year, name = 'total')
-输出
# A tibble: 11 × 3
quarter year total
<yearqtr> <dbl> <int>
1 2018 Q1 2018 3
2 2018 Q2 2018 3
3 2018 Q3 2018 4
4 2018 Q4 2018 5
5 2019 Q1 2019 6
6 2019 Q2 2019 7
7 2019 Q3 2019 8
8 2019 Q4 2019 8
9 2020 Q1 2020 8
10 2020 Q2 2020 8
11 2020 Q3 2020 8
patient_id dt_diag_init enroll
<int64> <date> <date>
1 10401 2018-01-04 2020-09-30
2 60701 2019-05-31 2019-09-30
3 343702 2018-12-05 2020-09-30
4 472202 2019-12-30 2020-09-30
5 489502 2019-09-17 2019-11-30
6 557401 2019-10-15 2020-09-30
7 857901 2018-01-02 2020-09-30
8 874201 2018-01-01 2020-09-30
9 1309102 2019-03-11 2020-09-30
10 1317601 2018-08-14 2020-09-30
我正在尝试根据日期范围(dt_diag_init 和注册)统计每季度的患者人数。 patient_id 10401 从 2018-01-04 开始到 2020-09-30,因此由于日期范围重叠,患者将被计入 Q1'18、Q2'18 .... 到 Q3'20。我正在尝试获得仅包含季度和每季度患者人数的输出 table。
样本输出table
Qtr year total
<chr> <dbl> <dbl>
1 Q1'18 2018 485
2 Q2'18 2018 516
3 Q3'18 2018 560
这就是我在进行计数之前尝试创建带有四分之一列的 table 的内容,但出现错误:
df_1 <- df %>%
mutate(quarter = map2(
as.numeric(dt_diag_init),
as.numeric(enroll),
~ format(seq(.x, .y, by="quarter"), "Q%q'%y")
))
采样日期:
df <- structure(list(patient_id = structure(c(5.13877678239481e-320,
2.99902787682095e-319, 1.69811350606928e-318, 2.33298786097528e-318,
2.41846121770582e-318, 2.75392685057557e-318, 4.23859411632851e-318,
4.31912681660064e-318, 6.46782325102068e-318, 6.50981389026072e-318
), class = "integer64"), dt_diag_init = structure(c(17535, 18047,
17870, 18260, 18156, 18184, 17533, 17532, 17966, 17757), class = "Date"),
enroll = structure(c(18535, 18169, 18535, 18535, 18230, 18535,
18535, 18535, 18535, 18535), class = "Date")), row.names = c(NA,
-10L), class = c("tbl_df", "tbl", "data.frame"))
也许这有帮助
library(dplyr)
library(zoo)
library(tidyr)
library(purrr)
library(lubridate)
df %>%
mutate(quarter = map2(dt_diag_init, enroll,
~ as.yearqtr(seq(.x, .y, by = 'quarter')))) %>%
unnest(quarter) %>%
mutate(year = year(quarter)) %>%
count(quarter, year, name = 'total')
-输出
# A tibble: 11 × 3
quarter year total
<yearqtr> <dbl> <int>
1 2018 Q1 2018 3
2 2018 Q2 2018 3
3 2018 Q3 2018 4
4 2018 Q4 2018 5
5 2019 Q1 2019 6
6 2019 Q2 2019 7
7 2019 Q3 2019 8
8 2019 Q4 2019 8
9 2020 Q1 2020 8
10 2020 Q2 2020 8
11 2020 Q3 2020 8