正则表达式匹配双引号内的每个字符串并包括转义引号
Regex match every string inside double quotes and include escaped quotation marks
已经有很多类似的问题,但 none 中的问题适用于我的情况。我有一个字符串,其中包含双引号内的多个子字符串,这些子字符串可以包含转义双引号。
例如字符串'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.',预期结果是一个有两个元素的数组;
"this is some sample text with quotes and \"escaped quotes\" inside"
"here is \"another\" one"
/"(?:\"|[^"])*"/g
正则表达式在 regex101 上按预期工作;但是,当我使用 String#match()
时,结果是不同的。查看下面的代码片段:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g
console.log(str.match(regex))
我得到了四个,而不是两个匹配项,甚至不包括转义引号内的文本。
MDN mentions 如果使用 g
标志,将返回所有匹配完整正则表达式的结果,但不会返回捕获组。如果我想获取捕获组并且设置了全局标志,我需要使用RegExp.exec()
。我试过了,结果是一样的:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g
let temp
let matches = []
while (temp = regex.exec(str))
matches.push(temp[0])
console.log(matches)
如何获得包含这两个匹配元素的数组?
正则表达式无法按预期工作的原因是单个反斜杠是转义字符。您需要转义文本中的反斜杠:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\"|[^"])*"/g
console.log(str);
console.log(str.match(regex))
str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
console.log(str);
console.log(str.match(regex))
另一种选择是没有 |
运算符的更优化的正则表达式:
const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\]*(?:\[\s\S][^"\]*)*"/g
console.log(str.match(regex))
使用 String.raw
,不需要转义引号两次。
参见 regex proof. Btw, 28 steps vs. 267 steps。
解释
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\]* any character except: '"', '\' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\ '\'
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
[^"\]* any character except: '"', '\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'
已经有很多类似的问题,但 none 中的问题适用于我的情况。我有一个字符串,其中包含双引号内的多个子字符串,这些子字符串可以包含转义双引号。
例如字符串'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.',预期结果是一个有两个元素的数组;
"this is some sample text with quotes and \"escaped quotes\" inside"
"here is \"another\" one"
/"(?:\"|[^"])*"/g
正则表达式在 regex101 上按预期工作;但是,当我使用 String#match()
时,结果是不同的。查看下面的代码片段:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g
console.log(str.match(regex))
我得到了四个,而不是两个匹配项,甚至不包括转义引号内的文本。
MDN mentions 如果使用 g
标志,将返回所有匹配完整正则表达式的结果,但不会返回捕获组。如果我想获取捕获组并且设置了全局标志,我需要使用RegExp.exec()
。我试过了,结果是一样的:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.'
let regex = /"(?:\"|[^"])*"/g
let temp
let matches = []
while (temp = regex.exec(str))
matches.push(temp[0])
console.log(matches)
如何获得包含这两个匹配元素的数组?
正则表达式无法按预期工作的原因是单个反斜杠是转义字符。您需要转义文本中的反斜杠:
let str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
let regex = /"(?:\"|[^"])*"/g
console.log(str);
console.log(str.match(regex))
str = 'And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.';
console.log(str);
console.log(str.match(regex))
另一种选择是没有 |
运算符的更优化的正则表达式:
const str = String.raw`And then, "this is some sample text with quotes and \"escaped quotes\" inside". Not that we need more, but... "here is \"another\" one". Just in case.`
const regex = /"[^"\]*(?:\[\s\S][^"\]*)*"/g
console.log(str.match(regex))
使用 String.raw
,不需要转义引号两次。
参见 regex proof. Btw, 28 steps vs. 267 steps。
解释
--------------------------------------------------------------------------------
" '"'
--------------------------------------------------------------------------------
[^"\]* any character except: '"', '\' (0 or more
times (matching the most amount possible))
--------------------------------------------------------------------------------
(?: group, but do not capture (0 or more times
(matching the most amount possible)):
--------------------------------------------------------------------------------
\ '\'
--------------------------------------------------------------------------------
[\s\S] any character of: whitespace (\n, \r,
\t, \f, and " "), non-whitespace (all
but \n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
[^"\]* any character except: '"', '\' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
)* end of grouping
--------------------------------------------------------------------------------
" '"'