R:分解字符串但将引用的文本保留为单个单词

R: Explode string but keep quoted text as a single word

我遇到了这个问题: PHP explode the string, but treat words in quotes as a single word

和类似处理使用 Regex 分解句子中的单词,由 space 分隔,但保持引用文本完整(作为单个单词)。

我想在 R 中做同样的事情。我试图将正则表达式复制粘贴到 stringi 包中的 stri_split 以及基础 R 中的 strsplit 中,但是当我怀疑正则表达式使用了 R 无法识别的格式。错误是:

Error: '\S' is an unrecognized escape in character string...

所需的输出将是:

mystr <- '"preceded by itself in quotation marks forms a complete sentence" preceded by itself in quotation marks forms a complete sentence'

myfoo(mystr)

[1] "preceded by itself in quotation marks forms a complete sentence" "preceded" "by" "itself" "in" "quotation" "marks" "forms" "a" "complete" "sentence"

尝试:strsplit(mystr, '/"(?:\\.|(?!").)*%22|\S+/') 给出:

Error in strsplit(mystr, "/\"(?:\\.|(?!\").)*%22|\S+/") : 
  invalid regular expression '/"(?:\.|(?!").)*%22|\S+/', reason 'Invalid regexp'

一个简单的选择是使用 scan:

> x <- scan(what = "", text = mystr)
Read 11 items
> x
 [1] "preceded by itself in quotation marks forms a complete sentence"
 [2] "preceded"                                                       
 [3] "by"                                                             
 [4] "itself"                                                         
 [5] "in"                                                             
 [6] "quotation"                                                      
 [7] "marks"                                                          
 [8] "forms"                                                          
 [9] "a"                                                              
[10] "complete"                                                       
[11] "sentence"