仅当字符串在 R 中出现多次时,如何使用替换字符串的第一次出现?

How to use replace the first occurrences of a string only if it appears more than once in R?

我有一个看起来像这样的字符串:

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")

每组之间有“&”。当存在多个“&”时,我想使用 R(sub()stringr 包中的某个东西)将每个“&”替换为“,”。但是,我不想更改最后的“&”。我该怎么做,它看起来像:

#Note: Only the 3rd and 4th strings should be changed
solution <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1, GROUP 2 & GROUP 3", "GROUP 1, GROUP 2, GROUP 3 & GROUP 4")

在实际的字符串中,可能有无数个“&”,所以我不想尽可能地硬编码一个限制。

我们可以使用带有先行断言的正则表达式 Regex lookahead, lookbehind and atomic groups

library(stringr)
str_replace_all(problem, " &(?=.*?&)", ", ")

输出:

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1,  GROUP 2 & GROUP 3"          
[4] "GROUP 1,  GROUP 2,  GROUP 3 & GROUP 4"

另一个解决方案:

str_replace_all(problem," &", ",") %>% 
  str_replace(", (GROUP [0-9])$", " & \1")

可以使用 Perl 模式和 \G 锚来完成。

确保 2 个或更多个 &,然后匹配具有另一个下游的任何 &。

(?m)(?:^(?=.*&.*&)|(?!^)\G)[^&\n]*\K&(?=.*&)

用逗号代替,

https://regex101.com/r/Mtvopf/1

 (?m)
 (?:
    ^ 
    (?= .* & .* & )
  | (?! ^ )
    \G 
 )
 [^&\n]* \K &
 (?= .* & )

使用strsplit

 sapply(strsplit(problem, "\s+&\s+"), 
    function(x) sub(",([^,]+$)", " & \1", toString(x)))

-输出

[1] "GROUP 1"                              "GROUP 1 &  GROUP 2"                   "GROUP 1, GROUP 2 &  GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 &  GROUP 4"

您可以使用

\K&(?= .* & )

模式匹配:

  • \K匹配一个space,并清空匹配缓冲区(忘了目前匹配的是什么)
  • &字面匹配
  • (?= .* & ) 正面前瞻,在右侧断言 space 并再次出现 &

Regex demo

例如

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
gsub(" \K&(?= .* & )", ",", problem, perl=T)

输出

[1] "GROUP 1"                              
[2] "GROUP 1 & GROUP 2"                    
[3] "GROUP 1 , GROUP 2 & GROUP 3"          
[4] "GROUP 1 , GROUP 2 , GROUP 3 & GROUP 4"

使用

problem <- c("GROUP 1", "GROUP 1 & GROUP 2", "GROUP 1 & GROUP 2 & GROUP 3", "GROUP 1 & GROUP 2 & GROUP 3 & GROUP 4")
library(stringr)
str_replace_all(problem, "\s*&\s*(?=[^&]*&)", ", ")

结果:

[1] "GROUP 1"                             "GROUP 1 & GROUP 2"                  
[3] "GROUP 1, GROUP 2 & GROUP 3"          "GROUP 1, GROUP 2, GROUP 3 & GROUP 4"

R proof

解释

--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  &                        '&'
--------------------------------------------------------------------------------
  \s*                      whitespace (\n, \r, \t, \f, and " ") (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  (?=                      look ahead to see if there is:
--------------------------------------------------------------------------------
    [^&]*                    any character except: '&' (0 or more
                             times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    &                        '&'
--------------------------------------------------------------------------------
  )                        end of look-ahead