R 累积总和使用带复位的 dplyr
R cumulative sum using dplyr with reset
我正在尝试制作一个 table 来计算按列 "state" 和 "p" 分组的连续年数,如下所示:
data_right <- data.table(state = c("NY", "NY", "NY", "NY", "NY","NY", "PA",
"PA", "PA", "PA", "PA", "PA"), p = c("n", "n","n","n", "p", "p", "n", "n", "n",
"p", "p", "p"),Year = c("1973", "1974", "1977", "1978", "1988", "1989" ,"1991",
"1992", "1993", "1920", "1929", "1931"), Consecutive_Yrs =
c(1,2,1,2,1,2,1,2,3,1,1,1))
我现在使用的代码无法正常工作。我在 dplyr 中尝试 mutate 和 group_by 语句,但我没有运气。我也不能使用 data.table 包,因为我的 R version
不是最新的。
非常感谢获得此输出的任何帮助!
library(dplyr)
data_right %>%
group_by(state, p) %>%
mutate(grp = cumsum(c(TRUE, diff(as.integer(Year)) > 1))) %>%
group_by(state, p, grp) %>%
mutate(cy = row_number()) %>%
ungroup() %>%
select(-grp)
# # A tibble: 12 x 5
# state p Year Consecutive_Yrs cy
# <chr> <chr> <chr> <dbl> <int>
# 1 NY n 1973 1 1
# 2 NY n 1974 2 2
# 3 NY n 1977 1 1
# 4 NY n 1978 2 2
# 5 NY p 1988 1 1
# 6 NY p 1989 2 2
# 7 PA n 1991 1 1
# 8 PA n 1992 2 2
# 9 PA n 1993 3 3
# 10 PA p 1920 1 1
# 11 PA p 1929 1 1
# 12 PA p 1931 1 1
假设数据已经按 Year
排序。
数据:
data_right <- data.table(state = c("NY", "NY", "NY", "NY", "NY","NY", "PA", "PA", "PA", "PA", "PA", "PA"), p = c("n", "n","n","n", "p", "p", "n", "n", "n", "p", "p", "p"),Year = c("1973", "1974", "1977", "1978", "1988", "1989" ,"1991", "1992", "1993", "1920", "1929", "1931"), Consecutive_Yrs = c(1,2,1,2,1,2,1,2,3,1,1,1))
我正在尝试制作一个 table 来计算按列 "state" 和 "p" 分组的连续年数,如下所示:
data_right <- data.table(state = c("NY", "NY", "NY", "NY", "NY","NY", "PA",
"PA", "PA", "PA", "PA", "PA"), p = c("n", "n","n","n", "p", "p", "n", "n", "n",
"p", "p", "p"),Year = c("1973", "1974", "1977", "1978", "1988", "1989" ,"1991",
"1992", "1993", "1920", "1929", "1931"), Consecutive_Yrs =
c(1,2,1,2,1,2,1,2,3,1,1,1))
我现在使用的代码无法正常工作。我在 dplyr 中尝试 mutate 和 group_by 语句,但我没有运气。我也不能使用 data.table 包,因为我的 R version
不是最新的。
非常感谢获得此输出的任何帮助!
library(dplyr)
data_right %>%
group_by(state, p) %>%
mutate(grp = cumsum(c(TRUE, diff(as.integer(Year)) > 1))) %>%
group_by(state, p, grp) %>%
mutate(cy = row_number()) %>%
ungroup() %>%
select(-grp)
# # A tibble: 12 x 5
# state p Year Consecutive_Yrs cy
# <chr> <chr> <chr> <dbl> <int>
# 1 NY n 1973 1 1
# 2 NY n 1974 2 2
# 3 NY n 1977 1 1
# 4 NY n 1978 2 2
# 5 NY p 1988 1 1
# 6 NY p 1989 2 2
# 7 PA n 1991 1 1
# 8 PA n 1992 2 2
# 9 PA n 1993 3 3
# 10 PA p 1920 1 1
# 11 PA p 1929 1 1
# 12 PA p 1931 1 1
假设数据已经按 Year
排序。
数据:
data_right <- data.table(state = c("NY", "NY", "NY", "NY", "NY","NY", "PA", "PA", "PA", "PA", "PA", "PA"), p = c("n", "n","n","n", "p", "p", "n", "n", "n", "p", "p", "p"),Year = c("1973", "1974", "1977", "1978", "1988", "1989" ,"1991", "1992", "1993", "1920", "1929", "1931"), Consecutive_Yrs = c(1,2,1,2,1,2,1,2,3,1,1,1))