Java Pattern.compile 忽略转义双引号 (\")
Java Pattern.compile ignoring escaped double quotes (\")
我很难找出忽略转义引号的模式。
我想要这个:
"10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only.","blah blah"
匹配为:
1> "10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only."
2> "blah blah"
我一直在尝试这个:
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
我明白了
1> "10\"
2> ","
您要查找的正则表达式是
"[^"\]*(?:\.[^"\]*)*"
见demo
在Java,
String pattern = "\"[^\"\\]*(?:\\.[^\"\\]*)*\"";
您的正则表达式似乎需要接受非引号或前面有 \
的引号。在那种情况下尝试
Pattern pattern = Pattern.compile("\"(\\.|[^\"])*\"");
这部分正则表达式 \\.|[^\"]
将尝试查找
\.
- 任何转义字符,
- (
|
或) [^\"]
- 任何非引号字符
我将 \.
放在 [^\"]
之前,以防止 \
被 [^\"]
匹配。
换句话说,对于 foo\"bar"
和正则表达式 \\.|[^\"]
这样的文本,您将获得此匹配项
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^^-matched by \.
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^-can't be matched by anything since there is no \ before
nor it is non-quote
演示:
String filteredCoupons = "\"10\\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only.\",\"blah blah\"";
Pattern pattern = Pattern.compile("\"(\\.|[^\"])*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
while(matcher.find()){
System.out.println(matcher.group());
}
输出:
"10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only."
"blah blah"
也可以用负数lookbehind:
(?s)".*?"(?<!\.)
作为 Java 字符串:
"(?s)\".*?\"(?<!\\.)"
参见test at regex101; test at regexplanet(点击"Java")
- 遇到
"
后,如果没有前面的反斜杠跳过一个字符,它会向后看
- 类似
".*?(?<!\)"
,但在遇到"
后回头看性能更好
- 使用
(?s)
标志使点也匹配换行符
出于兴趣,我用 regexhero.net (thanks @stribizhev for this link!). Was unsure if the stepscounter of regex101 处的示例字符串对不同版本进行了基准测试,这里是准确的。
基准测试仅使用非捕获组。有趣的是,"(?:\.|[^"])*"
的性能几乎是捕获组 "(\.|[^"])*"
.
的两倍
我很难找出忽略转义引号的模式。 我想要这个:
"10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only.","blah blah"
匹配为:
1> "10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only."
2> "blah blah"
我一直在尝试这个:
Pattern pattern = Pattern.compile("\"[^\"]*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
我明白了
1> "10\"
2> ","
您要查找的正则表达式是
"[^"\]*(?:\.[^"\]*)*"
见demo
在Java,
String pattern = "\"[^\"\\]*(?:\\.[^\"\\]*)*\"";
您的正则表达式似乎需要接受非引号或前面有 \
的引号。在那种情况下尝试
Pattern pattern = Pattern.compile("\"(\\.|[^\"])*\"");
这部分正则表达式 \\.|[^\"]
将尝试查找
\.
- 任何转义字符,- (
|
或)[^\"]
- 任何非引号字符
我将 \.
放在 [^\"]
之前,以防止 \
被 [^\"]
匹配。
换句话说,对于 foo\"bar"
和正则表达式 \\.|[^\"]
这样的文本,您将获得此匹配项
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^^-matched by \.
foo\"bar"
^^^-matched by [^\"]
foo\"bar"
^-can't be matched by anything since there is no \ before
nor it is non-quote
演示:
String filteredCoupons = "\"10\\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only.\",\"blah blah\"";
Pattern pattern = Pattern.compile("\"(\\.|[^\"])*\"");
Matcher matcher = pattern.matcher(filteredCoupons);
while(matcher.find()){
System.out.println(matcher.group());
}
输出:
"10\" 2 Topping Pizza, Pasta, or Sandwich for each. Valid until 2pm. Carryout only."
"blah blah"
也可以用负数lookbehind:
(?s)".*?"(?<!\.)
作为 Java 字符串:
"(?s)\".*?\"(?<!\\.)"
参见test at regex101; test at regexplanet(点击"Java")
- 遇到
"
后,如果没有前面的反斜杠跳过一个字符,它会向后看 - 类似
".*?(?<!\)"
,但在遇到"
后回头看性能更好
- 使用
(?s)
标志使点也匹配换行符
出于兴趣,我用 regexhero.net (thanks @stribizhev for this link!). Was unsure if the stepscounter of regex101 处的示例字符串对不同版本进行了基准测试,这里是准确的。
基准测试仅使用非捕获组。有趣的是,"(?:\.|[^"])*"
的性能几乎是捕获组 "(\.|[^"])*"
.