R 拆分字符串并保留匹配的子字符串?

R Split string and keep substrings righthand of match?

如何在 R 中执行此字符串 split()?当没有用破折号分隔的名字时,请停止随地吐痰。保留结果中给定的右侧子字符串。

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

# result: 
c("tim meyer XY900 123kncjd", "tom meyer XY900 123kncjd", "sepp moser VK123 456xyz", "max moser VK123 456xyz", "peter moser VK123 456xyz")

我会这样做(stringi):

library("stringi")

a <- c("tim/tom meyer XY900 123kncjd", "sepp/max/peter moser VK123 456xyz")

stri_split_fixed(stri_match_first_regex(a, "(.+?)[ ]")[,2], "/") -> start
stri_match_first_regex(a, "[ ](.+)")[,2] -> end


for(i in 1:length(end)){
    start[[i]] <- paste(start[[i]], end[i])
}

unlist(start)

## [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz" 
## [4] "max moser VK123 456xyz"   "peter moser VK123 456xyz"

这是使用一些不同的基本字符串函数的一种可能性。

## get the lengths of the output for each first name
len <- lengths(gregexpr("/", sub(" .*", "", a), fixed = TRUE)) + 1L
## extract all the first names 
## using the fact that they all end at the first space character
fn <- scan(text = a, sep = "/", what = "", comment.char = " ")
## paste them together
paste0(fn, rep(regmatches(a, regexpr(" .*", a)), len))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"

补充: 这里有第二种可能,使用的代码少一些。也可能会快一点。

s <- strsplit(a, "\/|( .*)")
paste0(unlist(s), rep(regmatches(a, regexpr(" .*", a)), lengths(s)))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd"
# [3] "sepp moser VK123 456xyz"  "max moser VK123 456xyz"  
# [5] "peter moser VK123 456xyz"

这是一种方法:

a <- c('tim/tom meyer XY900 123kncjd','sepp/max/peter moser VK123 456xyz');
do.call(c,lapply(strsplit(a,' '),function(w) apply(expand.grid(strsplit(w,'/')),1,paste,collapse=' ')));
## [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz"  "max moser VK123 456xyz"   "peter moser VK123 456xyz"

这个解决方案的一个优点是它对每个字符串中的所有单词执行拆分和重组,而不仅仅是第一个单词,正确返回所有单词变体的完整笛卡尔积:

a <- c('a/b/c d/e/f g/h/i','j/k/l m/n/o p/q/r');
do.call(c,lapply(strsplit(a,' '),function(w) apply(expand.grid(strsplit(w,'/')),1,paste,collapse=' ')));
## [1] "a d g" "b d g" "c d g" "a e g" "b e g" "c e g" "a f g" "b f g" "c f g" "a d h" "b d h" "c d h" "a e h" "b e h" "c e h" "a f h" "b f h" "c f h" "a d i" "b d i" "c d i" "a e i" "b e i" "c e i" "a f i" "b f i" "c f i" "j m p" "k m p" "l m p" "j n p" "k n p" "l n p" "j o p" "k o p" "l o p" "j m q" "k m q" "l m q" "j n q" "k n q" "l n q" "j o q" "k o q" "l o q" "j m r" "k m r" "l m r" "j n r" "k n r" "l n r" "j o r" "k o r" "l o r"

为什么不多一种方法来展示 R 解决方案的多种方法。用 / 符号分割字符串。将名字与字符串的其余部分分开。然后结合paste。有趣的问题顺便说一句:

unlist(sapply(strsplit(a, "/"), function(x) {len <- length(x)
  last <- gsub("^(\w+).*", "\1", x[len])
  fill <- gsub("^\w+ ", "", x[len])
  paste(c(x[-len], last), fill)}))
# [1] "tim meyer XY900 123kncjd" "tom meyer XY900 123kncjd" "sepp moser VK123 456xyz" 
# [4] "max moser VK123 456xyz"   "peter moser VK123 456xyz"