用连续字母标记列中序列的连续运行

Label consecutive runs of a sequence in a column with consecutive letters

我有以下数据:

df <- data.frame(week = rep(seq(1, 4, by=1), times = 3) )

   week
1     1
2     2
3     3
4     4
5     1
6     2
7     3
8     4
9     1
10    2
11    3
12    4

我想用字母标记 1:4 的每个连续运行,以便结果是这样的:

   week episode
1     1       a
2     2       a
3     3       a
4     4       a
5     1       b
6     2       b
7     3       b
8     4       b
9     1       c
10    2       c
11    3       c
12    4       c

我尝试了以下方法,但这并不能区分序列的单独连续运行 1:4

data.frame(df, episode = letters[cumsum(c(1L, diff(df$week) > 1L))]) 
   week episode
1     1       a
2     2       a
3     3       a
4     4       a
5     1       a
6     2       a
7     3       a
8     4       a
9     1       a
10    2       a
11    3       a
12    4       a

如果已经在一个序列中,那么就做逻辑向量的累加(week == 1)

library(dplyr)
df %>% 
    mutate(episode =  letters[cumsum(week == 1)])
#   week episode
#1     1       a
#2     2       a
#3     3       a
#4     4       a
#5     1       b
#6     2       b
#7     3       b
#8     4       b
#9     1       c
#10    2       c
#11    3       c
#12    4       c

或使用base R(没有任何附加包)

df$episode <- letters[cumsum(df$week == 1)]

另一种 dplyr 可能性是:

df %>%
 mutate(episode = letters[gl(n()/4, 4)])

   week episode
1     1       a
2     2       a
3     3       a
4     4       a
5     1       b
6     2       b
7     3       b
8     4       b
9     1       c
10    2       c
11    3       c
12    4       c

或与base R相同:

df$episode = letters[gl(length(df$week)/4, 4)]

或者:

df %>%
 mutate(episode = letters[ceiling(seq_along(week)/4)])

或与base R相同:

df$episode = letters[ceiling(seq_along(df$week)/4)]

您可以使用 data.table 包中的 rowid

library(data.table)
setDT(df)

df[, episode := letters[rowid(week)]]

#     week episode
#  1:    1       a
#  2:    2       a
#  3:    3       a
#  4:    4       a
#  5:    1       b
#  6:    2       b
#  7:    3       b
#  8:    4       b
#  9:    1       c
# 10:    2       c
# 11:    3       c
# 12:    4       c