R: strsplit 基于两个条件,保留分隔符
R: strsplit based on two conditions, keeping deliminator
我正在尝试根据不同的标准拆分句子。我希望在 "traction" 之后拆分一些句子,在 "ramasse" 之后拆分一些句子。我查了grepl的语法规则但是没看懂
一个名为 export
的数据框有一列 ref
,它的 str 值以 "traction" 或 "ramasse" 结尾。
>export$ref
ref
[1] "62133130_074_traction"
[2] "62156438_074_ramasse"
[3] "62153874_070_ramasse"
[4] "62138861_074_traction"
我想将 ref 列中的 str 值一分为二。
ref R&T
[1] "62133130_074_" "traction"
[2] "62156438_074_" "ramasse"
[3] "62153874_070_" "ramasse"
[4] "62138861_074_" "traction"
我试过的(none个都不错)
strsplit(export$ref, c("traction", "ramasse"))
strsplit(export$ref, "\_(?<=\btraction)|\_(?<=\bramasse)", perl = TRUE)
strsplit(export$ref, "(?=['traction''ramasse'])", perl = TRUE)
如有任何帮助,我们将不胜感激!
这里有一个不同的方法:
strsplit(x, "_(?=[^_]+$)", perl = TRUE)
[[1]]
[1] "62133130_074" "traction"
[[2]]
[1] "62156438_074" "ramasse"
[[3]]
[1] "62153874_070" "ramasse"
[[4]]
[1] "62138861_074" "traction"
这意味着在下划线 (“_”) 处拆分列/向量,后跟任意数量的不包含其他下划线的符号。
这是另一个使用 stringr::str_split
的选项:
library(stringr);
str_split(ref, pattern = "_(?=[A-Za-z]+)", simplify = T)
# [,1] [,2]
#[1,] "62133130_074" "traction"
#[2,] "62156438_074" "ramasse"
#[3,] "62153874_070" "ramasse"
#[4,] "62138861_074" "traction"
示例数据
ref <- c(
"62133130_074_traction",
"62156438_074_ramasse",
"62153874_070_ramasse",
"62138861_074_traction")
我正在尝试根据不同的标准拆分句子。我希望在 "traction" 之后拆分一些句子,在 "ramasse" 之后拆分一些句子。我查了grepl的语法规则但是没看懂
一个名为 export
的数据框有一列 ref
,它的 str 值以 "traction" 或 "ramasse" 结尾。
>export$ref
ref
[1] "62133130_074_traction"
[2] "62156438_074_ramasse"
[3] "62153874_070_ramasse"
[4] "62138861_074_traction"
我想将 ref 列中的 str 值一分为二。
ref R&T
[1] "62133130_074_" "traction"
[2] "62156438_074_" "ramasse"
[3] "62153874_070_" "ramasse"
[4] "62138861_074_" "traction"
我试过的(none个都不错)
strsplit(export$ref, c("traction", "ramasse"))
strsplit(export$ref, "\_(?<=\btraction)|\_(?<=\bramasse)", perl = TRUE)
strsplit(export$ref, "(?=['traction''ramasse'])", perl = TRUE)
如有任何帮助,我们将不胜感激!
这里有一个不同的方法:
strsplit(x, "_(?=[^_]+$)", perl = TRUE)
[[1]]
[1] "62133130_074" "traction"
[[2]]
[1] "62156438_074" "ramasse"
[[3]]
[1] "62153874_070" "ramasse"
[[4]]
[1] "62138861_074" "traction"
这意味着在下划线 (“_”) 处拆分列/向量,后跟任意数量的不包含其他下划线的符号。
这是另一个使用 stringr::str_split
的选项:
library(stringr);
str_split(ref, pattern = "_(?=[A-Za-z]+)", simplify = T)
# [,1] [,2]
#[1,] "62133130_074" "traction"
#[2,] "62156438_074" "ramasse"
#[3,] "62153874_070" "ramasse"
#[4,] "62138861_074" "traction"
示例数据
ref <- c(
"62133130_074_traction",
"62156438_074_ramasse",
"62153874_070_ramasse",
"62138861_074_traction")