Java 用于过滤注释行未按预期工作的正则表达式
Java Regex to filter lines with comment not working as expected
我整理了这个简化版本的代码来演示这个问题:
public static void main(String []args){
String content="1 [thing i want]\n" +
"2 [thing i dont want]\n" +
"3 [thing i dont want] [thing i want]\n" +
"4 // [thing i want]\n" +
"5 [thing i want] // [thing i want]\n";
String BASE_REGEX = "(?!//)\[%s\]";
Pattern myRegex = Pattern.compile(String.format(BASE_REGEX, "thing i want"));
Matcher m= myRegex.matcher(content);
System.out.println("match? "+m);
String newContent = m.replaceAll("best thing ever");
System.out.println("regex "+myRegex);
System.out.println("content:\n"+content);
System.out.println("new content:\n"+newContent);
}
我希望我的输出是:
new content:
1 best thing ever
2 [thing i dont want]
3 [thing i dont want] best thing ever
4 // [thing i want]
5 best thing ever // [thing i want]
但我看到了:
new content:
1 best thing ever
2 [thing i dont want]
3 [thing i dont want] best thing ever
4 // best thing ever
5 best thing ever // best thing ever
如何修复正则表达式?
未修改的字符串:
content:
1 [thing i want]
2 [thing i dont want]
3 [thing i dont want] [thing i want]
4 // [thing i want]
5 [thing i want] // [thing i want]
没有真正简单的方法来测试内联评论中是否包含某些内容。 Java 正则表达式引擎能够向后看,但 "distance" 有限(换句话说,它允许有限的可变长度后视),我不确定使用此功能构建模式是否非常有效。
你可以做的是从每一行的开头检查所有内容:
(?m)((?:\G|^)[^\[/\n]*+(?:\[(?!thing i want\])[^\[/\n]*|/(?!/)[^\[/\n]*)*+)\[thing i want\]
(转义每个反斜杠以在 Java 中写入模式字符串)
随着替换:
best thing ever
解释:目标是捕获所有从目标之前的行开始或从同一行中的上一个目标到下一个目标。通过这种方式,您可以准确描述在目标出现之前允许或不允许的内容 (所有不是目标或两个连续斜杠的内容).
(?m) # switch the multi-line mode on: the ^ means "start of the line"
( # open the capture group
(?: # non-capturing group: two possible starts
\G # contiguous to a previous match (on the same line)
| # OR
^ # at the start of the line
)
[^\[/\n]*+ # all that is not: an opening bracket, a slash or a newline
# * stands for "0 or more times" and the + after forbids
# to backtrack in this part if the pattern fails later
# "*+" is called a "possessive quantifier"
(?:
\[ # literal [
(?!thing i want\]) # not followed by "thing i want]"
[^\[/\n]*
| # OR
/ # literal /
(?!/) # not followed by an other /
[^\[/\n]*
)*+ # zero or more times
) # close the capture group
\[thing i want\] # the target
我整理了这个简化版本的代码来演示这个问题:
public static void main(String []args){
String content="1 [thing i want]\n" +
"2 [thing i dont want]\n" +
"3 [thing i dont want] [thing i want]\n" +
"4 // [thing i want]\n" +
"5 [thing i want] // [thing i want]\n";
String BASE_REGEX = "(?!//)\[%s\]";
Pattern myRegex = Pattern.compile(String.format(BASE_REGEX, "thing i want"));
Matcher m= myRegex.matcher(content);
System.out.println("match? "+m);
String newContent = m.replaceAll("best thing ever");
System.out.println("regex "+myRegex);
System.out.println("content:\n"+content);
System.out.println("new content:\n"+newContent);
}
我希望我的输出是:
new content:
1 best thing ever
2 [thing i dont want]
3 [thing i dont want] best thing ever
4 // [thing i want]
5 best thing ever // [thing i want]
但我看到了:
new content:
1 best thing ever
2 [thing i dont want]
3 [thing i dont want] best thing ever
4 // best thing ever
5 best thing ever // best thing ever
如何修复正则表达式?
未修改的字符串:
content:
1 [thing i want]
2 [thing i dont want]
3 [thing i dont want] [thing i want]
4 // [thing i want]
5 [thing i want] // [thing i want]
没有真正简单的方法来测试内联评论中是否包含某些内容。 Java 正则表达式引擎能够向后看,但 "distance" 有限(换句话说,它允许有限的可变长度后视),我不确定使用此功能构建模式是否非常有效。
你可以做的是从每一行的开头检查所有内容:
(?m)((?:\G|^)[^\[/\n]*+(?:\[(?!thing i want\])[^\[/\n]*|/(?!/)[^\[/\n]*)*+)\[thing i want\]
(转义每个反斜杠以在 Java 中写入模式字符串)
随着替换:
best thing ever
解释:目标是捕获所有从目标之前的行开始或从同一行中的上一个目标到下一个目标。通过这种方式,您可以准确描述在目标出现之前允许或不允许的内容 (所有不是目标或两个连续斜杠的内容).
(?m) # switch the multi-line mode on: the ^ means "start of the line"
( # open the capture group
(?: # non-capturing group: two possible starts
\G # contiguous to a previous match (on the same line)
| # OR
^ # at the start of the line
)
[^\[/\n]*+ # all that is not: an opening bracket, a slash or a newline
# * stands for "0 or more times" and the + after forbids
# to backtrack in this part if the pattern fails later
# "*+" is called a "possessive quantifier"
(?:
\[ # literal [
(?!thing i want\]) # not followed by "thing i want]"
[^\[/\n]*
| # OR
/ # literal /
(?!/) # not followed by an other /
[^\[/\n]*
)*+ # zero or more times
) # close the capture group
\[thing i want\] # the target