R 中的正则表达式 strsplit 表达式所以它只适用于每个字符串中特定字符的第一次出现?
regex strsplit expression in R so it only applies once to the first occurrence of a specific character in each string?
我有一个充满字符串的列表:
string<- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L")
我需要拆分字符串,使它们看起来像:
"SPG_L", "subgenual_ACC_R", "SPG_R", "MTG_L_pole", "MTG_L_pole", "CerebellumGM_L"
我尝试使用以下正则表达式拆分字符串:
str_split(string,'(?<=[[RL]|pole])_')
但这会导致:
"SPG_L", "subgenual" "ACC_R", "SPG_R", "MTG_L", "pole", "MTG_L", "pole", "CerebellumGM_L"
如何编辑正则表达式,以便在第一次出现 "R"、"L" 后在“_”处拆分每个字符串元素,除非第一次出现 "R" 或"L"后面跟着"pole",那么它在第一次出现"pole"之后拆分字符串元素并且每个字符串元素只拆分一次?
split_again = function(x){
if(length(x) > 1){
return(x)
}
else{
str_split(
string = x,
pattern = '(?<=[R|L])_',
n = 2)
}
}
str_split(
string = string,
pattern = '(?<=pole)_',
n = 2) %>%
lapply(split_again) %>%
unlist()
您可以使用 sub
然后 strsplit
,如图所示:
strsplit(sub("^.*?[LR](?:_pole)?\K_",":",string,perl=TRUE),":")
[[1]]
[1] "SPG_L" "subgenual_ACC_R"
[[2]]
[1] "SPG_R" "MTG_L_pole"
[[3]]
[1] "MTG_L_pole" "CerebellumGM_L"
我建议匹配方法使用
^(.*?[RL](?:_pole)?)_(.*)
详情
^
- 字符串开头
(.*?[RL](?:_pole)?)
- 第 1 组:
.*?
- 除换行字符外的任何零个或多个字符尽可能少
[RL](?:_pole)?
- R
或 L
后跟 _pole
_
- 下划线
(.*)
- 第 2 组:除换行字符外的任何零个或多个字符尽可能多
library(stringr)
x <- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L", "SFG_pole_R_IFG_triangularis_L", "SFG_pole_R_IFG_opercularis_L" )
res <- str_match_all(x, "^(.*?[RL](?:_pole)?)_(.*)")
lapply(res, function(x) x[-1])
输出:
[[1]]
[1] "SPG_L" "subgenual_ACC_R"
[[2]]
[1] "SPG_R" "MTG_L_pole"
[[3]]
[1] "MTG_L_pole" "CerebellumGM_L"
[[4]]
[1] "SFG_pole_R" "IFG_triangularis_L"
[[5]]
[1] "SFG_pole_R" "IFG_opercularis_L"
我有一个充满字符串的列表:
string<- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L")
我需要拆分字符串,使它们看起来像:
"SPG_L", "subgenual_ACC_R", "SPG_R", "MTG_L_pole", "MTG_L_pole", "CerebellumGM_L"
我尝试使用以下正则表达式拆分字符串:
str_split(string,'(?<=[[RL]|pole])_')
但这会导致:
"SPG_L", "subgenual" "ACC_R", "SPG_R", "MTG_L", "pole", "MTG_L", "pole", "CerebellumGM_L"
如何编辑正则表达式,以便在第一次出现 "R"、"L" 后在“_”处拆分每个字符串元素,除非第一次出现 "R" 或"L"后面跟着"pole",那么它在第一次出现"pole"之后拆分字符串元素并且每个字符串元素只拆分一次?
split_again = function(x){
if(length(x) > 1){
return(x)
}
else{
str_split(
string = x,
pattern = '(?<=[R|L])_',
n = 2)
}
}
str_split(
string = string,
pattern = '(?<=pole)_',
n = 2) %>%
lapply(split_again) %>%
unlist()
您可以使用 sub
然后 strsplit
,如图所示:
strsplit(sub("^.*?[LR](?:_pole)?\K_",":",string,perl=TRUE),":")
[[1]]
[1] "SPG_L" "subgenual_ACC_R"
[[2]]
[1] "SPG_R" "MTG_L_pole"
[[3]]
[1] "MTG_L_pole" "CerebellumGM_L"
我建议匹配方法使用
^(.*?[RL](?:_pole)?)_(.*)
详情
^
- 字符串开头(.*?[RL](?:_pole)?)
- 第 1 组:.*?
- 除换行字符外的任何零个或多个字符尽可能少[RL](?:_pole)?
-R
或L
后跟_pole
_
- 下划线(.*)
- 第 2 组:除换行字符外的任何零个或多个字符尽可能多
library(stringr)
x <- c("SPG_L_subgenual_ACC_R", "SPG_R_MTG_L_pole", "MTG_L_pole_CerebellumGM_L", "SFG_pole_R_IFG_triangularis_L", "SFG_pole_R_IFG_opercularis_L" )
res <- str_match_all(x, "^(.*?[RL](?:_pole)?)_(.*)")
lapply(res, function(x) x[-1])
输出:
[[1]]
[1] "SPG_L" "subgenual_ACC_R"
[[2]]
[1] "SPG_R" "MTG_L_pole"
[[3]]
[1] "MTG_L_pole" "CerebellumGM_L"
[[4]]
[1] "SFG_pole_R" "IFG_triangularis_L"
[[5]]
[1] "SFG_pole_R" "IFG_opercularis_L"