用于提取以模式开头但在符号或空格之前结束的单词的正则表达式

Regular expression to extract words that starts with a pattern, but ends before symbols or spaces

我有以下示例 ,其中 proc 作为正则表达式 :

x <- "carr proc proc_ proca select procb() procth;"
pattern <- "proc"

预期的结果是

"proc" "proca" "procb" "procth"

可以是列表或向量。

我用 stringr::str_extract_all 尝试了其他几个正则表达式,但无法得到我想要的所有单词。

使用

pattern <- "\bproc[[:alnum:]]*\b"

参见regex proof

解释

--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char
--------------------------------------------------------------------------------
  proc                     'proc'
--------------------------------------------------------------------------------
  [[:alnum:]]*             any character of: letters and digits (0 or
                           more times (matching the most amount
                           possible))
--------------------------------------------------------------------------------
  \b                       the boundary between a word char (\w) and
                           something that is not a word char

这个呢?

> unique(agrep(pattern, unlist(strsplit(x, "[^[:alpha:]]+")), value = TRUE))
[1] "proc"   "proca"  "procb"  "procth"