如何在 R 向量中的新后缀后重置数字序列

How to reset a numerical sequence after a new suffix in a R vector

我创建了一个包含组列和个人标识符的数据框,其中包含组名和格式化为标准化三位数代码的数字:

library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))

一切都很好,但是为了这个理想的结果,我将如何在每次有新前缀时重置个人标识符:

df2 <- data.frame(group, indiv = c("A001", "A002", "A003", 
                                   "B001", "B002", "B003", 
                                   "C001", "C002", "C003"))

我们可以按'group'分组,使用substr从'indiv'中提取第一个字符并使用sprintf格式化序列(row_number()

library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(indiv = sprintf('%s%03d', substr(indiv, 1, 1), row_number())) %>%
  ungroup

-输出

# A tibble: 9 × 2
  group indiv
  <chr> <chr>
1 A     A001 
2 A     A002 
3 A     A003 
4 B     B001 
5 B     B002 
6 B     B003 
7 C     C001 
8 C     C002 
9 C     C003 

或与data.table

紧凑
library(data.table)
setDT(df)[, indiv := sprintf('%s%03d', group, rowid(group))]

或使用base R

df$indiv <-  with(df, sprintf('%s%03d', group, 
       ave(seq_along(group), group, FUN = seq_along)))

另一个基础 R 解决方案:

df <- data.frame(group, 
            indiv = paste(group, str_pad(rep(1:3, 3), 
                    pad = 0, width = 3 , "left"), sep = ""))

这是使用 akrun 的 sprintf

的替代方法
library(dplyr)

df %>% 
  group_by(group) %>% 
  mutate(indiv = paste0(group, sprintf("%03d", row_number())))

输出:

  group indiv
  <chr> <chr>
1 A     A001 
2 A     A002 
3 A     A003 
4 B     B001 
5 B     B002 
6 B     B003 
7 C     C001 
8 C     C002 
9 C     C003

你可以在mutate里面单独使用sprintf():

library(dplyr)

df |> 
  group_by(group) |> 
  mutate(indiv = sprintf("%s%03d", group, 1:n()))

%s:字符串,在本例中为group.

%03d:将 3 个前导零添加到一个整数 (%d),在本例中为分组中的行号。