Change 计数每次更改,但在 R 中遇到 0 时重置

Change Count on every change but reset on encountering 0 in R

我有一个数据集 DF

   structure(list(Company= c("ABC", "ABC", 
"ABC", "ABC", "ABC", 
"ABC", "ABC", "XYZ", 
"XYZ", "XYZ"), year = 1951:1960, 
    dummyconflict = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L), .Label = c("0", "1"), class = "factor")), row.names = 2:11, class = "data.frame")

我想添加另一列,以便向上增加计数。也就是说,如果一家公司在一年内从级别“1”变为“0”,则计数从 1 开始,如果在继续计数后的那一年其级别为“1”; 2,3,4,5,6 等。然而,如果它再次回到“0”,计数将再次从零开始。

请根据上述条件帮助添加另一列

图像中的预期结果

enter image description here

df = structure(list(Company= c("ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "ABC", "XYZ", "XYZ", "XYZ"), 
                    year = 1951:1960, 
                    dummyconflict = structure(c(1L, 1L, 2L, 2L, 2L, 1L, 2L, 2L, 2L, 2L), .Label = c("0", "1"), class = "factor")), 
                  row.names = 2:11, class = "data.frame")

library(dplyr)
library(data.table)

df %>%
  mutate(dummyconflict = as.numeric(as.character(dummyconflict))) %>% # update column to numeric
  group_by(Company) %>%                                               # for each company
  mutate(dummy2 = ifelse(row_number() == 1, 0, dummyconflict)) %>%    # create dummy2 variable to ignore 1s in first row
  group_by(Company, flag = rleid(dummy2)) %>%                         # create another group based on 1s and 0s positions and group by that and company
  mutate(NewVar = cumsum(dummy2)) %>%                                 # get cumulative sum of dummy2 column
  ungroup() %>%                                                       # forget the grouping
  select(Company, year, dummyconflict, NewVar)                        # keep relevant columns

# # A tibble: 10 x 4
#   Company  year dummyconflict NewVar
#   <chr>   <int>         <dbl>  <dbl>
# 1 ABC      1951             0      0
# 2 ABC      1952             0      0
# 3 ABC      1953             1      1
# 4 ABC      1954             1      2
# 5 ABC      1955             1      3
# 6 ABC      1956             0      0
# 7 ABC      1957             1      1
# 8 XYZ      1958             1      0
# 9 XYZ      1959             1      1
#10 XYZ      1960             1      2

最好 运行 这个过程一步一步来确保你了解它是如何工作的,这样当你将它应用到你的大数据集时你可以很容易地发现任何错误。