用于提取以模式开头但在符号或空格之前结束的单词的正则表达式
Regular expression to extract words that starts with a pattern, but ends before symbols or spaces
我有以下示例 ,其中 proc
作为正则表达式 :
x <- "carr proc proc_ proca select procb() procth;"
pattern <- "proc"
预期的结果是
"proc" "proca" "procb" "procth"
可以是列表或向量。
我用 stringr::str_extract_all 尝试了其他几个正则表达式,但无法得到我想要的所有单词。
使用
pattern <- "\bproc[[:alnum:]]*\b"
参见regex proof。
解释
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
proc 'proc'
--------------------------------------------------------------------------------
[[:alnum:]]* any character of: letters and digits (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
这个呢?
> unique(agrep(pattern, unlist(strsplit(x, "[^[:alpha:]]+")), value = TRUE))
[1] "proc" "proca" "procb" "procth"
我有以下示例 ,其中 proc
作为正则表达式 :
x <- "carr proc proc_ proca select procb() procth;"
pattern <- "proc"
预期的结果是
"proc" "proca" "procb" "procth"
可以是列表或向量。
我用 stringr::str_extract_all 尝试了其他几个正则表达式,但无法得到我想要的所有单词。
使用
pattern <- "\bproc[[:alnum:]]*\b"
参见regex proof。
解释
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
proc 'proc'
--------------------------------------------------------------------------------
[[:alnum:]]* any character of: letters and digits (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
这个呢?
> unique(agrep(pattern, unlist(strsplit(x, "[^[:alpha:]]+")), value = TRUE))
[1] "proc" "proca" "procb" "procth"