正则表达式匹配双引号内的每个字符串并包括转义引号

Question

已经有很多类似的问题，但 none 中的问题适用于我的情况。我有一个字符串，其中包含双引号内的多个子字符串，这些子字符串可以包含转义双引号。

例如字符串'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'，预期结果是一个有两个元素的数组；

"this is some sample text with quotes and \"escaped quotes\" inside"
"here is \"another\" one"

/"(?:\"|[^"])*"/g 正则表达式在 regex101 上按预期工作；但是，当我使用 String#match() 时，结果是不同的。查看下面的代码片段：

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g

console.log(str.match(regex))

我得到了四个，而不是两个匹配项，甚至不包括转义引号内的文本。

MDN mentions 如果使用 g 标志，将返回所有匹配完整正则表达式的结果，但不会返回捕获组。如果我想获取捕获组并且设置了全局标志，我需要使用RegExp.exec()。我试过了，结果是一样的：

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g
let temp
let matches = []

while (temp = regex.exec(str))
  matches.push(temp[0])

console.log(matches)

如何获得包含这两个匹配元素的数组？

Answer 1

正则表达式无法按预期工作的原因是单个反斜杠是转义字符。您需要转义文本中的反斜杠：

let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\"|[^"])*"/g

console.log(str);
console.log(str.match(regex))

str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';

console.log(str);
console.log(str.match(regex))

Answer 2

另一种选择是没有 | 运算符的更优化的正则表达式：

const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\]*(?:\[\s\S][^"\]*)*"/g
console.log(str.match(regex))

使用 String.raw，不需要转义引号两次。

参见 regex proof. Btw, 28 steps vs. 267 steps。

解释

--------------------------------------------------------------------------------
  "                        '"'
--------------------------------------------------------------------------------
  [^"\]*                  any character except: '"', '\' (0 or more
                           times (matching the most amount possible))
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (0 or more times
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \                       '\'
--------------------------------------------------------------------------------
    [\s\S]                   any character of: whitespace (\n, \r,
                             \t, \f, and " "), non-whitespace (all
                             but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
    [^"\]*                  any character except: '"', '\' (0 or
                             more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
  )*                       end of grouping
--------------------------------------------------------------------------------
  "                        '"'

正则表达式匹配双引号内的每个字符串并包括转义引号

Regex match every string inside double quotes and include escaped quotation marks

javascript

regex

string-matching

regex-group