根据特定元素的位置在字符串向量中插入连字符或破折号
Inserting hyphen or en dash in a string vector depending on location of specific elements
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
问题
我想得出 vecB
对应于:
vecB <- c("Population 12 - 22",
"Population 90 over",
"population under 78",
"population 99 - 101",
"Population 12 - 54",
"Population 78 - 92")
主要特征
vecB
具有以下特点:
- 插入前两位数字space和破折号和space后(
-
)
- 如果 space 存在,则仅插入破折号 (
-
)
- 对于像
underDigitDigit
这样的组合,只插入 space:under DigitDigit
尝试
我正在考虑使用 gsub 中的组,行如下:
gsub("^([[:alpha:]]*[[:blank:]])(\d{2})(.*)$", "\2", vecA)
但这并不适用于所有情况:
> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\d{2})(.*)$", "\2", vecA)))
[,1]
[1,] "12"
[2,] "90"
[3,] "population under78"
[4,] "99"
[5,] "12"
[6,] "78"
t()
仅用于展示目的; regex101 link.
这是我的建议 - 分两步进行:1) 先在数字之间添加连字符,然后 2) 在单词 "over"/"under" 之间添加 space和号码:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\s*([[:digit:]])", "\1\2 - \3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\d+)|(\d+)(over|under))", "\1\2 \3", v, perl=T)
code demo 的输出:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
第二个正则表达式包含分支重置模式 (?|...|...)
以在替代子模式中保持相同的组 ID,因此需要 perl=T
。
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
问题
我想得出 vecB
对应于:
vecB <- c("Population 12 - 22",
"Population 90 over",
"population under 78",
"population 99 - 101",
"Population 12 - 54",
"Population 78 - 92")
主要特征
vecB
具有以下特点:
- 插入前两位数字space和破折号和space后(
-
) - 如果 space 存在,则仅插入破折号 (
-
) - 对于像
underDigitDigit
这样的组合,只插入 space:under DigitDigit
尝试
我正在考虑使用 gsub 中的组,行如下:
gsub("^([[:alpha:]]*[[:blank:]])(\d{2})(.*)$", "\2", vecA)
但这并不适用于所有情况:
> t(t(gsub("^([[:alpha:]]*[[:blank:]])(\d{2})(.*)$", "\2", vecA)))
[,1]
[1,] "12"
[2,] "90"
[3,] "population under78"
[4,] "99"
[5,] "12"
[6,] "78"
t()
仅用于展示目的; regex101 link.
这是我的建议 - 分两步进行:1) 先在数字之间添加连字符,然后 2) 在单词 "over"/"under" 之间添加 space和号码:
vecA <- c("Population 1222",
"Population 90over",
"population under78",
"population 99101",
"Population 1254",
"Population 78 92")
v <- gsub("^([[:alpha:]]+[[:blank:]]+)([[:digit:]]{2})\s*([[:digit:]])", "\1\2 - \3", vecA)
gsub("^([[:alpha:]]+[[:blank:]]+)(?|(over|under)(\d+)|(\d+)(over|under))", "\1\2 \3", v, perl=T)
code demo 的输出:
[1] "Population 12 - 22" "Population 90 over" "population under 78"
[4] "population 99 - 101" "Population 12 - 54" "Population 78 - 92"
第二个正则表达式包含分支重置模式 (?|...|...)
以在替代子模式中保持相同的组 ID,因此需要 perl=T
。