在特定模式（数字或文本）之前的位置用 stringr::str_split 拆分文本

Question

假设我有 table 个字符串：

df <-tibble::tribble(
 ~ alternatives,
" 23.32 | x232 code | This is a description| 43.11 | a341 code | some other description | optimised | v333 code | still another description" )

我想在数值前面的位置拆分字符串：例如。 23.32 之前，43.11 之前，以及“优化”一词之前。

我希望在每个单元格中实现向量：

c(23.32 | x232 code | This is a description|, 43.11 | a341 code | some other description |,  optimised | v333 code | still another description)

在特定模式之前实现拆分的正则表达式模式应该是什么？有关模式之间的管道字符数可能不同，我不能可靠地使用它们。我隐约知道前瞻等。这段代码不会 return 我所期望的，但我相信我正在寻找类似的解决方案（这不会做我想要的）：

df2 <- 
  df %>% 
  mutate(alternatives = 
           str_split(alternatives, 
                     pattern = "(?<=[a-zA-Z])\s*(?=[0-9])"))
enter code here

解决方案是什么？

Answer 1

您可以尝试按照以下正则表达式模式拆分：

(?<=\S)\s+(?=(?:\d+\.\d+|optimised)\b)

Demo

更新脚本：

df2 <- df %>% 
    mutate(alternatives = 
        str_split(alternatives, 
                  pattern = "(?<=\S)\s+(?=(?:\d+\.\d+|optimised)\b)"))

在特定模式（数字或文本）之前的位置用 stringr::str_split 拆分文本

Split text with stringr::str_split in location preceding specific pattern (numeral or text)

regex

r

strsplit

stringr

regex-lookarounds

Demo