R: strsplit 基于两个条件,保留分隔符

R: strsplit based on two conditions, keeping deliminator

我正在尝试根据不同的标准拆分句子。我希望在 "traction" 之后拆分一些句子,在 "ramasse" 之后拆分一些句子。我查了grepl的语法规则但是没看懂

一个名为 export 的数据框有一列 ref,它的 str 值以 "traction" 或 "ramasse" 结尾。

>export$ref
                        ref
[1] "62133130_074_traction"
[2]  "62156438_074_ramasse"
[3]  "62153874_070_ramasse"
[4] "62138861_074_traction"

我想将 ref 列中的 str 值一分为二。

                ref           R&T
[1] "62133130_074_"    "traction"
[2] "62156438_074_"     "ramasse"
[3]  "62153874_070_"    "ramasse"
[4] "62138861_074_"    "traction"

我试过的(none个都不错)

strsplit(export$ref, c("traction", "ramasse"))
strsplit(export$ref, "\_(?<=\btraction)|\_(?<=\bramasse)", perl = TRUE)
strsplit(export$ref, "(?=['traction''ramasse'])", perl = TRUE)

如有任何帮助,我们将不胜感激!

这里有一个不同的方法:

strsplit(x, "_(?=[^_]+$)", perl = TRUE)

[[1]]
[1] "62133130_074" "traction"    

[[2]]
[1] "62156438_074" "ramasse"     

[[3]]
[1] "62153874_070" "ramasse"     

[[4]]
[1] "62138861_074" "traction"

这意味着在下划线 (“_”) 处拆分列/向量,后跟任意数量的不包含其他下划线的符号。

这是另一个使用 stringr::str_split 的选项:

library(stringr);
str_split(ref, pattern = "_(?=[A-Za-z]+)", simplify = T)
#    [,1]           [,2]
#[1,] "62133130_074" "traction"
#[2,] "62156438_074" "ramasse"
#[3,] "62153874_070" "ramasse"
#[4,] "62138861_074" "traction"

示例数据

ref <- c(
    "62133130_074_traction",
    "62156438_074_ramasse",
    "62153874_070_ramasse",
    "62138861_074_traction")