仅当字符串在 R 中出现多次时,如何使用替换字符串的第一次出现?
How to use replace the first occurrences of a string only if it appears more than once in R?
我有一个看起来像这样的字符串:
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
每组之间有“&”。当存在多个“&”时,我想使用 R(sub()
或 stringr
包中的某个东西)将每个“&”替换为“,”。但是,我不想更改最后的“&”。我该怎么做,它看起来像:
#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")
在实际的字符串中,可能有无数个“&”,所以我不想尽可能地硬编码一个限制。
我们可以使用带有先行断言的正则表达式 Regex lookahead, lookbehind and atomic groups。
library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")
输出:
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3"
[4] "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
另一个解决方案:
str_replace_all(problem," &", ",") %>%
str_replace(", (GROUP [0-9])$", " & \1")
可以使用 Perl 模式和 \G 锚来完成。
确保 2 个或更多个 &,然后匹配具有另一个下游的任何 &。
(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)
用逗号代替,
https://regex101.com/r/Mtvopf/1
(?m)
(?:
^
(?= .* & .* & )
| (?! ^ )
\G
)
[^&\n]* \K &
(?= .* & )
使用strsplit
sapply(strsplit(problem, "\s+&\s+"),
function(x) sub(",([^,]+$)", " & \1", toString(x)))
-输出
[1] "GROUP 1" "GROUP 1 & GROUP 2" "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
您可以使用
\K&(?= .* & )
模式匹配:
\K
匹配一个space,并清空匹配缓冲区(忘了目前匹配的是什么)
&
字面匹配
(?= .* & )
正面前瞻,在右侧断言 space 并再次出现 &
例如
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \K&(?= .* & )", ",", problem, perl=T)
输出
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1 , GROUP 2 & GROUP 3"
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"
使用
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\s*&\s*(?=[^&]*&)", ", ")
结果:
[1] "GROUP 1" "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
见R proof。
解释
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^&]* any character except: '&' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
) end of look-ahead
我有一个看起来像这样的字符串:
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
每组之间有“&”。当存在多个“&”时,我想使用 R(sub()
或 stringr
包中的某个东西)将每个“&”替换为“,”。但是,我不想更改最后的“&”。我该怎么做,它看起来像:
#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")
在实际的字符串中,可能有无数个“&”,所以我不想尽可能地硬编码一个限制。
我们可以使用带有先行断言的正则表达式 Regex lookahead, lookbehind and atomic groups。
library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")
输出:
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3"
[4] "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
另一个解决方案:
str_replace_all(problem," &", ",") %>%
str_replace(", (GROUP [0-9])$", " & \1")
可以使用 Perl 模式和 \G 锚来完成。
确保 2 个或更多个 &,然后匹配具有另一个下游的任何 &。
(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)
用逗号代替,
https://regex101.com/r/Mtvopf/1
(?m)
(?:
^
(?= .* & .* & )
| (?! ^ )
\G
)
[^&\n]* \K &
(?= .* & )
使用strsplit
sapply(strsplit(problem, "\s+&\s+"),
function(x) sub(",([^,]+$)", " & \1", toString(x)))
-输出
[1] "GROUP 1" "GROUP 1 & GROUP 2" "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
您可以使用
\K&(?= .* & )
模式匹配:
\K
匹配一个space,并清空匹配缓冲区(忘了目前匹配的是什么)&
字面匹配(?= .* & )
正面前瞻,在右侧断言 space 并再次出现&
例如
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \K&(?= .* & )", ",", problem, perl=T)
输出
[1] "GROUP 1"
[2] "GROUP 1 & GROUP 2"
[3] "GROUP 1 , GROUP 2 & GROUP 3"
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"
使用
problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\s*&\s*(?=[^&]*&)", ", ")
结果:
[1] "GROUP 1" "GROUP 1 & GROUP 2"
[3] "GROUP 1, GROUP 2 & GROUP 3" "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"
见R proof。
解释
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
[^&]* any character except: '&' (0 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
) end of look-ahead