如何在 R 向量中的新后缀后重置数字序列
How to reset a numerical sequence after a new suffix in a R vector
我创建了一个包含组列和个人标识符的数据框,其中包含组名和格式化为标准化三位数代码的数字:
library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))
一切都很好,但是为了这个理想的结果,我将如何在每次有新前缀时重置个人标识符:
df2 <- data.frame(group, indiv = c("A001", "A002", "A003",
"B001", "B002", "B003",
"C001", "C002", "C003"))
我们可以按'group'分组,使用substr
从'indiv'中提取第一个字符并使用sprintf
格式化序列(row_number()
)
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = sprintf('%s%03d', substr(indiv, 1, 1), row_number())) %>%
ungroup
-输出
# A tibble: 9 × 2
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
或与data.table
紧凑
library(data.table)
setDT(df)[, indiv := sprintf('%s%03d', group, rowid(group))]
或使用base R
df$indiv <- with(df, sprintf('%s%03d', group,
ave(seq_along(group), group, FUN = seq_along)))
另一个基础 R 解决方案:
df <- data.frame(group,
indiv = paste(group, str_pad(rep(1:3, 3),
pad = 0, width = 3 , "left"), sep = ""))
这是使用 akrun 的 sprintf
的替代方法
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = paste0(group, sprintf("%03d", row_number())))
输出:
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
你可以在mutate
里面单独使用sprintf()
:
library(dplyr)
df |>
group_by(group) |>
mutate(indiv = sprintf("%s%03d", group, 1:n()))
%s
:字符串,在本例中为group
.
%03d
:将 3 个前导零添加到一个整数 (%d
),在本例中为分组中的行号。
我创建了一个包含组列和个人标识符的数据框,其中包含组名和格式化为标准化三位数代码的数字:
library(stringr)
group = rep(c("A", "B", "C"), each = 3)
df <- data.frame(group, indiv = paste(group, str_pad(1:9, pad = 0, width = 3 , "left"), sep = ""))
一切都很好,但是为了这个理想的结果,我将如何在每次有新前缀时重置个人标识符:
df2 <- data.frame(group, indiv = c("A001", "A002", "A003",
"B001", "B002", "B003",
"C001", "C002", "C003"))
我们可以按'group'分组,使用substr
从'indiv'中提取第一个字符并使用sprintf
格式化序列(row_number()
)
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = sprintf('%s%03d', substr(indiv, 1, 1), row_number())) %>%
ungroup
-输出
# A tibble: 9 × 2
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
或与data.table
library(data.table)
setDT(df)[, indiv := sprintf('%s%03d', group, rowid(group))]
或使用base R
df$indiv <- with(df, sprintf('%s%03d', group,
ave(seq_along(group), group, FUN = seq_along)))
另一个基础 R 解决方案:
df <- data.frame(group,
indiv = paste(group, str_pad(rep(1:3, 3),
pad = 0, width = 3 , "left"), sep = ""))
这是使用 akrun 的 sprintf
library(dplyr)
df %>%
group_by(group) %>%
mutate(indiv = paste0(group, sprintf("%03d", row_number())))
输出:
group indiv
<chr> <chr>
1 A A001
2 A A002
3 A A003
4 B B001
5 B B002
6 B B003
7 C C001
8 C C002
9 C C003
你可以在mutate
里面单独使用sprintf()
:
library(dplyr)
df |>
group_by(group) |>
mutate(indiv = sprintf("%s%03d", group, 1:n()))
%s
:字符串,在本例中为group
.
%03d
:将 3 个前导零添加到一个整数 (%d
),在本例中为分组中的行号。