将顺序增加的组号分配给R中具有相同指标值的数据块
assigning sequentially increasing group number to chunks of data with same indicator value in R
给定日期范围:
set.seed(123)
data.frame("var1" = runif(10),
"indicator" = c(rep(1,2),rep(0,2), rep(1,2), rep(0,2), rep(1,2)))
var1 indicator
1 0.8895393 1
2 0.6928034 1
3 0.6405068 0
4 0.9942698 0
5 0.6557058 1
6 0.7085305 1
7 0.5440660 0
8 0.5941420 0
9 0.2891597 1
10 0.1471136 1
如何才能使“指标”列中的第一组分配为“1”,第二组分配为“2”等?
生成的数据框应如下所示:
var1 indicator new_col
1 0.96302423 1 1
2 0.90229905 1 1
3 0.69070528 0 0
4 0.79546742 0 0
5 0.02461368 1 2
6 0.47779597 1 2
7 0.75845954 0 0
8 0.21640794 0 0
9 0.31818101 1 3
10 0.23162579 1 3
正在寻找 tidyverse 解决方案。
在base R
中可以用rle
来完成
dat$new_col <- inverse.rle(within.list(rle(dat$indicator),
{values[values == 1] <- seq_len(sum(values))}))
-输出
> dat
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
或使用dplyr
library(dplyr)
library(data.table)
dat %>%
mutate(new_col = rleid(indicator) * indicator,
new_col = match(new_col, unique(new_col[new_col != 0]), nomatch = 0))
-输出
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
或 data.table
setDT(dat)[, new_col := fcoalesce(as.integer(factor(rleid(indicator) *
NA^!indicator)), 0L)]
使用cumsum
:
df$v <- with(df, cumsum(indicator == 1 & dplyr::lag(indicator == 0, default = 1)))
df$v[df$indicator == 0] <- 0
var1 indicator v
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
给定日期范围:
set.seed(123)
data.frame("var1" = runif(10),
"indicator" = c(rep(1,2),rep(0,2), rep(1,2), rep(0,2), rep(1,2)))
var1 indicator
1 0.8895393 1
2 0.6928034 1
3 0.6405068 0
4 0.9942698 0
5 0.6557058 1
6 0.7085305 1
7 0.5440660 0
8 0.5941420 0
9 0.2891597 1
10 0.1471136 1
如何才能使“指标”列中的第一组分配为“1”,第二组分配为“2”等?
生成的数据框应如下所示:
var1 indicator new_col
1 0.96302423 1 1
2 0.90229905 1 1
3 0.69070528 0 0
4 0.79546742 0 0
5 0.02461368 1 2
6 0.47779597 1 2
7 0.75845954 0 0
8 0.21640794 0 0
9 0.31818101 1 3
10 0.23162579 1 3
正在寻找 tidyverse 解决方案。
在base R
中可以用rle
dat$new_col <- inverse.rle(within.list(rle(dat$indicator),
{values[values == 1] <- seq_len(sum(values))}))
-输出
> dat
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
或使用dplyr
library(dplyr)
library(data.table)
dat %>%
mutate(new_col = rleid(indicator) * indicator,
new_col = match(new_col, unique(new_col[new_col != 0]), nomatch = 0))
-输出
var1 indicator new_col
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3
或 data.table
setDT(dat)[, new_col := fcoalesce(as.integer(factor(rleid(indicator) *
NA^!indicator)), 0L)]
使用cumsum
:
df$v <- with(df, cumsum(indicator == 1 & dplyr::lag(indicator == 0, default = 1)))
df$v[df$indicator == 0] <- 0
var1 indicator v
1 0.2875775 1 1
2 0.7883051 1 1
3 0.4089769 0 0
4 0.8830174 0 0
5 0.9404673 1 2
6 0.0455565 1 2
7 0.5281055 0 0
8 0.8924190 0 0
9 0.5514350 1 3
10 0.4566147 1 3