从累积数据中获取反向发生率数据?

Obtaining back incidence data from cumulative data?

我有一个数据框,其中包含日期数据和累计计数。 我正在尝试对 cumsum 进行 反转以获取每日计数 还获取每组的计数。 我正在尝试从数据框 A 转到数据框 B。 我正在使用 R 和 tidyr.

这是代码:


df <- data.frame(cum_count = c(5, 14, 50, 5, 14, 50),
                 state = c("Alabama", "Alabama", "Alabama", "NY", "NY", "NY"),
                 Year = c(2012:2014, 2012:2014))

Dataframe A
  cum_count   state Year
1         5 Alabama 2012
2        14 Alabama 2013
3        50 Alabama 2014
4         5      NY 2012
5        14      NY 2013
6        50      NY 2014
Dataframe B
  cum_count   state Year
1         5 Alabama 2012
2         9 Alabama 2013
3        36 Alabama 2014
4         5      NY 2012
5         9      NY 2013
6        36      NY 2014

我试过使用 diff 函数:

df <- df %>%group_by(state)%>%
      mutate(daily_count = diff(cum_count))

但是我明白了

Error: Column daily_count must be length 3 (the number of rows) or one, not 2

告诉我你的想法。

谢谢!

diff returns length 比原来的长度少一并且 mutate 要求输出列与原来的 length 相同(或者长度 1,可回收)。我们可以附加一个值 NA 或 'cum_count'

first
library(dplyr)
df %>%
  group_by(state)%>%
  mutate(daily_count = c(first(cum_count), diff(cum_count)))
# A tibble: 6 x 4
# Groups:   state [2]
#  cum_count state    Year daily_count
#      <dbl> <fct>   <int>       <dbl>
#1         5 Alabama  2012           5
#2        14 Alabama  2013           9
#3        50 Alabama  2014          36
#4         5 NY       2012           5
#5        14 NY       2013           9
#6        50 NY       2014          36

或者为此目的,使用 lag 并从列本身中减去

df %>%
    group_by(state)%>%
    mutate(daily_count = replace_na(cum_count - lag(cum_count), first(cum_count)))