R:提取同一天的最大数量后,如何从累积数据中计算持续 2 天的发生率?
R: How to calculate incidence of 2days duration from a cumulative data after extracting the maximum number of the same day?
我有一个像这样的累积数据;
df1 <- data.frame(code=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,5,5),
date=c("2020-01-01", "2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
"2020-01-04","2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
"2020-01-02","2020-01-04","2020-01-03","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
cumulative=c(2,3,3,4,4,4,4,6,6,7,8,10,13,14,16,1,2,3,5,1,2,3,5))
从这里,我想提取每个代码和每个日期的最大累计数 like;
df2 <- data.frame(code=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
date=c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
"2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
"2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
cumulative=c(3,3,4,4,4,4,6,6,8,13,14,16,1,2,3,5,1,2,3,5))
现在我有每天每个代码的累计数字。
从这里我想计算 2 天持续时间的发生率。
df3 <- data.frame(code=c(1,2,3,4,5),
incidence1=c(1,2,6,2,2),incidence2=c(1,2,3,3,3))
Incidence1 表示 2020-01-01 和 2020-01-03 之间的差异,
Incidence2表示2020-01-02和2020-01-04之间的差异
我想知道的是
1)如何提取当天最大数
2) 如何计算2days
之间的差异
请赐教,谢谢。
这是一种方法,通过创建每个交替行的组并获得它们之间的 cumulative
值的差异。要获得与所示格式相同的预期输出,我们可以使用 tidyr
中的 pivot_wider
。
library(dplyr)
library(tidyr)
df2 %>%
group_by(code) %>%
group_by(gr = rep(seq(1, n()/2), 2), add = TRUE) %>%
summarise(incidence = diff(cumulative)) %>%
pivot_wider(names_from = gr, values_from = incidence, names_prefix = "incidence")
# code incidence1 incidence2
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 2 2 2
#3 3 6 3
#4 4 2 3
#5 5 2 3
我有一个像这样的累积数据;
df1 <- data.frame(code=c(1,1,1,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,5,5,5,5),
date=c("2020-01-01", "2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
"2020-01-04","2020-01-01","2020-01-01","2020-01-02","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
"2020-01-02","2020-01-04","2020-01-03","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
cumulative=c(2,3,3,4,4,4,4,6,6,7,8,10,13,14,16,1,2,3,5,1,2,3,5))
从这里,我想提取每个代码和每个日期的最大累计数 like;
df2 <- data.frame(code=c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5),
date=c("2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03",
"2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04","2020-01-01",
"2020-01-02","2020-01-03","2020-01-04","2020-01-01","2020-01-02","2020-01-03","2020-01-04"),
cumulative=c(3,3,4,4,4,4,6,6,8,13,14,16,1,2,3,5,1,2,3,5))
现在我有每天每个代码的累计数字。 从这里我想计算 2 天持续时间的发生率。
df3 <- data.frame(code=c(1,2,3,4,5),
incidence1=c(1,2,6,2,2),incidence2=c(1,2,3,3,3))
Incidence1 表示 2020-01-01 和 2020-01-03 之间的差异, Incidence2表示2020-01-02和2020-01-04之间的差异
我想知道的是 1)如何提取当天最大数 2) 如何计算2days
之间的差异请赐教,谢谢。
这是一种方法,通过创建每个交替行的组并获得它们之间的 cumulative
值的差异。要获得与所示格式相同的预期输出,我们可以使用 tidyr
中的 pivot_wider
。
library(dplyr)
library(tidyr)
df2 %>%
group_by(code) %>%
group_by(gr = rep(seq(1, n()/2), 2), add = TRUE) %>%
summarise(incidence = diff(cumulative)) %>%
pivot_wider(names_from = gr, values_from = incidence, names_prefix = "incidence")
# code incidence1 incidence2
# <dbl> <dbl> <dbl>
#1 1 1 1
#2 2 2 2
#3 3 6 3
#4 4 2 3
#5 5 2 3