计算并分配变量的连续出现次数

Count and Assign Consecutive Occurrences of Variable

我希望对任何值的连续出现进行计数,并将该计数分配给下一列中的该值。以下是输入和所需输出的示例:

dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"))
dataset$count <- c(1,2,2,2,2,1,4,4,4,4,1,1)

dataset  
   input   count
     a       1
     b       2
     b       2
     a       2
     a       2
     c       1
     a       4
     a       4
     a       4
     a       4
     b       1
     c       1

使用 rle(dataset$input) 我可以得到每个值的出现次数。但我想要以上格式的结果输出。

我的问题类似于: R: count consecutive occurrences of values in a single column 但是这里的输出是按顺序的,我想将计数本身分配给那个值。

您可以在 rle

中重复 lengths 次参数 lengths
with(rle(dataset$input), rep(lengths, lengths))
#[1] 1 2 2 2 2 1 4 4 4 4 1 1

使用dplyr,我们可以使用lag创建分组,然后统计每组的行数

library(dplyr)

dataset %>%
  group_by(gr = cumsum(input != lag(input, default = first(input)))) %>%
  mutate(count = n())

data.table

library(data.table)
setDT(dataset)[, count:= .N, rleid(input)]

数据

确保 input 列是字符而不是 factor

dataset <- data.frame(input = c("a","b","b","a","a","c","a","a","a","a","b","c"),
           stringsAsFactors = FALSE)

我们可以使用 rleiddplyr

library(dplyr)
dataset %>%
   group_by(grp = rleid(input)) %>%
   mutate(count = n())