R 中的正则表达式 strsplit 表达式所以它只适用于每个字符串中特定字符的第一次出现？

Question

我有一个充满字符串的列表： string<- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L")

我需要拆分字符串，使它们看起来像：

"SPG_L", "subgenual_ACC_R", "SPG_R", "MTG_L_pole", "MTG_L_pole", "CerebellumGM_L"

我尝试使用以下正则表达式拆分字符串：

str_split(string,'(?<=[[RL]|pole])_')

但这会导致：

"SPG_L", "subgenual" "ACC_R", "SPG_R", "MTG_L", "pole", "MTG_L", "pole", "CerebellumGM_L"

如何编辑正则表达式，以便在第一次出现 "R"、"L" 后在“_”处拆分每个字符串元素，除非第一次出现 "R" 或"L"后面跟着"pole"，那么它在第一次出现"pole"之后拆分字符串元素并且每个字符串元素只拆分一次？

Answer 1

split_again = function(x){
  if(length(x) > 1){
    return(x)
  }
  else{
    str_split(
      string = x,
      pattern = '(?<=[R|L])_', 
      n = 2)
  }
}
str_split(
  string = string,
  pattern = '(?<=pole)_', 
  n = 2) %>% 
  lapply(split_again) %>% 
  unlist()

Answer 2

您可以使用 sub 然后 strsplit，如图所示：

strsplit(sub("^.*?[LR](?:_pole)?\K_",":",string,perl=TRUE),":")
[[1]]
[1] "SPG_L"           "subgenual_ACC_R"

[[2]]
[1] "SPG_R"      "MTG_L_pole"

[[3]]
[1] "MTG_L_pole"     "CerebellumGM_L"

Answer 3

我建议匹配方法使用

^(.*?[RL](?:_pole)?)_(.*)

见regex demo

详情

^ - 字符串开头
(.*?[RL](?:_pole)?) - 第 1 组：
- .*? - 除换行字符外的任何零个或多个字符尽可能少
- [RL](?:_pole)? - R 或 L 后跟 _pole
_ - 下划线
(.*) - 第 2 组：除换行字符外的任何零个或多个字符尽可能多

见the R demo:

library(stringr)
x <- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L", "SFG_pole_R_IFG_triangularis_L", "SFG_pole_R_IFG_opercularis_L" )

res <- str_match_all(x, "^(.*?[RL](?:_pole)?)_(.*)")
lapply(res, function(x) x[-1])

输出：

[[1]]
[1] "SPG_L"           "subgenual_ACC_R"

[[2]]
[1] "SPG_R"      "MTG_L_pole"

[[3]]
[1] "MTG_L_pole"     "CerebellumGM_L"

[[4]]
[1] "SFG_pole_R"         "IFG_triangularis_L"

[[5]]
[1] "SFG_pole_R"        "IFG_opercularis_L"

R 中的正则表达式 strsplit 表达式所以它只适用于每个字符串中特定字符的第一次出现？

regex strsplit expression in R so it only applies once to the first occurrence of a specific character in each string?

regex

r

strsplit