在字符串中用 0 左填充
Left padding with 0 in a string
我正在尝试清理一些数据。这应该很简单,但我正在努力解决这个问题。我想在字符串中保留 1-9,但如果数字大于 10,我不想更改字符串。我一直在使用 gsub()
,但我无法做到找到一种方法告诉 R 忽略我要替换的模式中 1 之后的任何值。
df = data.frame("col1" = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
"col2" = c("test 1", "test 2", "test 3", "test 14", "test 15", "test 16", "test 17", "test 18", "test 19", "test 20" ))
> df
col1 col2
1 1 test 1
2 2 test 2
3 3 test 3
4 4 test 14
5 5 test 15
6 6 test 16
7 7 test 17
8 8 test 18
9 9 test 19
10 10 test 20
# This is what I've been trying without much luck
test <- df %>%
mutate(col2 = gsub("test 1", "test 01", col2))
# My result
> test
col1 col2
1 1 test 01
2 2 test 2
3 3 test 3
4 4 test 014
5 5 test 015
6 6 test 016
7 7 test 017
8 8 test 018
9 9 test 019
10 10 test 20
----------------
> desired
col1 col2
1 1 test 01
2 2 test 02
3 3 test 03
4 4 test 14
5 5 test 15
6 6 test 16
7 7 test 17
8 8 test 18
9 9 test 19
10 10 test 20
我们可以用parse_number
提取数字部分,用sprintf
填充2位,同时粘贴前缀'test'
library(dplyr)
df %>%
mutate(col2 = sprintf('test %02d', readr::parse_number(col2)))
-输出
# col1 col2
#1 1 test 01
#2 2 test 02
#3 3 test 03
#4 4 test 14
#5 5 test 15
#6 6 test 16
#7 7 test 17
#8 8 test 18
#9 9 test 19
#10 10 test 20
或使用 sub
,捕获字符串末尾 ($
) 的数字 (\d
),后跟 space (\s
), 在替换中,添加一个 space 后跟 0 和捕获组的反向引用 (\1
)
with(df, sub("\s(\d)$", " 0\1", col2))
#[1] "test 01" "test 02" "test 03" "test 14" "test 15"
#[6] "test 16" "test 17" "test 18" "test 19" "test 20"
另一种解决方案,使用 str_pad
和负前瞻 (?!\d)
将填充限制为个位数:
library(stringr)
str_pad(sub("test (\d)(?!\d)","test 0\1", df$col2, perl = T), width = 2, side = "left", pad = "0")
[1] "test 01" "test 02" "test 03" "test test 14" "test test 15" "test test 16"
[7] "test test 17" "test test 18" "test test 19" "test test 20"
我正在尝试清理一些数据。这应该很简单,但我正在努力解决这个问题。我想在字符串中保留 1-9,但如果数字大于 10,我不想更改字符串。我一直在使用 gsub()
,但我无法做到找到一种方法告诉 R 忽略我要替换的模式中 1 之后的任何值。
df = data.frame("col1" = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
"col2" = c("test 1", "test 2", "test 3", "test 14", "test 15", "test 16", "test 17", "test 18", "test 19", "test 20" ))
> df
col1 col2
1 1 test 1
2 2 test 2
3 3 test 3
4 4 test 14
5 5 test 15
6 6 test 16
7 7 test 17
8 8 test 18
9 9 test 19
10 10 test 20
# This is what I've been trying without much luck
test <- df %>%
mutate(col2 = gsub("test 1", "test 01", col2))
# My result
> test
col1 col2
1 1 test 01
2 2 test 2
3 3 test 3
4 4 test 014
5 5 test 015
6 6 test 016
7 7 test 017
8 8 test 018
9 9 test 019
10 10 test 20
----------------
> desired
col1 col2
1 1 test 01
2 2 test 02
3 3 test 03
4 4 test 14
5 5 test 15
6 6 test 16
7 7 test 17
8 8 test 18
9 9 test 19
10 10 test 20
我们可以用parse_number
提取数字部分,用sprintf
填充2位,同时粘贴前缀'test'
library(dplyr)
df %>%
mutate(col2 = sprintf('test %02d', readr::parse_number(col2)))
-输出
# col1 col2
#1 1 test 01
#2 2 test 02
#3 3 test 03
#4 4 test 14
#5 5 test 15
#6 6 test 16
#7 7 test 17
#8 8 test 18
#9 9 test 19
#10 10 test 20
或使用 sub
,捕获字符串末尾 ($
) 的数字 (\d
),后跟 space (\s
), 在替换中,添加一个 space 后跟 0 和捕获组的反向引用 (\1
)
with(df, sub("\s(\d)$", " 0\1", col2))
#[1] "test 01" "test 02" "test 03" "test 14" "test 15"
#[6] "test 16" "test 17" "test 18" "test 19" "test 20"
另一种解决方案,使用 str_pad
和负前瞻 (?!\d)
将填充限制为个位数:
library(stringr)
str_pad(sub("test (\d)(?!\d)","test 0\1", df$col2, perl = T), width = 2, side = "left", pad = "0")
[1] "test 01" "test 02" "test 03" "test test 14" "test test 15" "test test 16"
[7] "test test 17" "test test 18" "test test 19" "test test 20"