在 R 正则表达式的文本段落中获取超过 1 个引号

Question

首先：找到引号内的文字 "I want everything inside here".

其二：在引用前摘录1句。

如果可能的话，我想通过查看 R 中的正则表达式来实现这个理想的输出

示例：

Yoyo. He is sad. Oh no! "Don't sad!" Yeah: "Testing...  testings," Boys. Sun. Tree... 0.2% green,"LL" "WADD" HOLA.

期望的输出：

[1] Oh no! "Don't sad!"
[2] Yeah: "Testing... testings"
[3] Tree... 0.2% green, "LL"
[4] Tree... 0.2% green, "LL" "WADD"

输出：

"Yoyo. He is sad. Oh no! \"Don't sad!\" Yeah: \"Testing...  testings,\" Boys. Sun. Tree... 0.2% green,\"LL\" \"WAAD\" HOLA."

尝试过使用这个但不能工作：

str_extract(t, "(?<=\.\s)[^.:]*[.:]\s*\"[^\"]*\"")

也尝试过：

regmatches(t , gregexpr('^[^\.]+[\.\,\:]\s+(.*(?:\"[^\"]+\")).*$', t))

regmatches(t , gregexpr('\"[^\"]*\"(?<=\s[.?][^\.\s])', t))

试过你的方法@naurel：

> regmatches(t, regexpr("(?:\"? *([^\"]*))(\"[^\"]*\")", t, perl=T))
[1] " Yoyo. He is sad. Oh no! \"Don't sad!\""

Answer 1

因为你只想要最后一句话，我已经为你清除了正则表达式：result

说明：首先，您正在寻找引号之间的内容。如果连续有多个引号，您希望它们作为一个匹配。

(\"[^\"]*\"(?: *\"[^\"]*\")*)

成功了。然后你想匹配这个组之前的句子。句子以大写字母开头。所以我们将开始匹配到前面定义的组之前的第一个大写字母（即：后面没有任何其他大写字母）

([A-Z](?:[a-z0-9\W\s])*)

把它放在一起你得到：

([A-Z](?:[a-z0-9\W\s])*)(\"[^\"]*\"(?: *\"[^\"]*\")*)

Get more than 1 quotations in text paragraph in R regex