将顺序增加的组号分配给R中具有相同指标值的数据块

assigning sequentially increasing group number to chunks of data with same indicator value in R

给定日期范围:

set.seed(123)
data.frame("var1" = runif(10),
           "indicator" = c(rep(1,2),rep(0,2), rep(1,2), rep(0,2), rep(1,2)))
       
 var1           indicator
1  0.8895393         1
2  0.6928034         1
3  0.6405068         0
4  0.9942698         0
5  0.6557058         1
6  0.7085305         1
7  0.5440660         0
8  0.5941420         0
9  0.2891597         1
10 0.1471136         1

如何才能使“指标”列中的第一组分配为“1”,第二组分配为“2”等?

生成的数据框应如下所示:

     var1        indicator  new_col
1  0.96302423         1       1
2  0.90229905         1       1
3  0.69070528         0       0
4  0.79546742         0       0
5  0.02461368         1       2
6  0.47779597         1       2
7  0.75845954         0       0
8  0.21640794         0       0
9  0.31818101         1       3
10 0.23162579         1       3

正在寻找 tidyverse 解决方案。

base R中可以用rle

来完成
dat$new_col <- inverse.rle(within.list(rle(dat$indicator), 
     {values[values == 1] <- seq_len(sum(values))}))

-输出

> dat
        var1 indicator new_col
1  0.2875775         1       1
2  0.7883051         1       1
3  0.4089769         0       0
4  0.8830174         0       0
5  0.9404673         1       2
6  0.0455565         1       2
7  0.5281055         0       0
8  0.8924190         0       0
9  0.5514350         1       3
10 0.4566147         1       3

或使用dplyr

library(dplyr)
library(data.table)
dat %>%
   mutate(new_col = rleid(indicator) * indicator,
   new_col = match(new_col, unique(new_col[new_col != 0]), nomatch = 0))

-输出

        var1 indicator new_col
1  0.2875775         1       1
2  0.7883051         1       1
3  0.4089769         0       0
4  0.8830174         0       0
5  0.9404673         1       2
6  0.0455565         1       2
7  0.5281055         0       0
8  0.8924190         0       0
9  0.5514350         1       3
10 0.4566147         1       3

data.table

setDT(dat)[, new_col := fcoalesce(as.integer(factor(rleid(indicator) * 
            NA^!indicator)), 0L)]

使用cumsum:

df$v <- with(df, cumsum(indicator == 1 & dplyr::lag(indicator == 0, default = 1)))
df$v[df$indicator == 0] <- 0

        var1 indicator v
1  0.2875775         1 1
2  0.7883051         1 1
3  0.4089769         0 0
4  0.8830174         0 0
5  0.9404673         1 2
6  0.0455565         1 2
7  0.5281055         0 0
8  0.8924190         0 0
9  0.5514350         1 3
10 0.4566147         1 3