匹配所有接受特定字符的正则表达式中的负前瞻

Question

我在 Java 中定义具有负前瞻性的正则表达式时遇到问题。

给定以下字符串：

Today [#[#item#] was|the items were#] shipped so [#it is|they are#] gone.

我正在尝试根据某个值将此字符串转换为以下形式之一（是的，这是区分单数和复数形式的一种方式）：

Today [#item#] was shipped so it is gone. 或 Today the items were shipped so they are gone.

我正在尝试使用 Java 中的正则表达式来匹配此模式并实施此转换：

public String convert(String text, boolean isSingular) {
    Pattern spPattern = Pattern.compile("\[#.*?\|.*?#\]");
    Matcher matcher = spPattern.matcher(text);
    while (matcher.find()) {
        int start = matcher.start()+2;
        int end = matcher.end()-2;
        int indexOfPipe = text.indexOf("|", start);
        String replacement = (isSingular) ? text.substring(start, indexOfPipe) : text.substring(indexOfPipe+1, end);
        text = matcher.replaceFirst(replacement);
        matcher = spPattern.matcher(text);
     }
}

对于单数形式：在 while 循环的第一次迭代后 text 是 Today [#item#] was shipped so [#it is|they are#] gone.，这是可以的。然而在第二次迭代中 Matcher 匹配组 [#item#] was shipped so [#it is|they are#] 而它应该是 [#it is|they are#]。我很确定我需要某种消极的前瞻性。

我已经尝试过以下模式，但它似乎没有做任何事情：

(\[#.*?\|.*?#\])(?!\[#[^\|]*?#\]) ("try to match everything between [# and #] accept those cases which do not contain a | between those tags")

我错过了什么？

Answer 1

简介

您遇到的问题是因为您的第一个替换生成 Today [#item#] was shipped so [#it is|they are#] gone. 而您的正则表达式匹配 [#item#] was shipped so [#it is|they are#]。然后，您的正则表达式会错误地替换此字符串。

解决此问题的 true 方法是创建解析器，但如果函数以 递归方式运行正则表达式，则可以使用正则表达式（由于 while 循环，它有点的作用）；所以这个答案也适用于 [#[#a|b#]|b#] 之类的东西，但请注意，任何进一步的嵌套都会失败（如果它嵌套在单数侧）。

代码

See regex in use here

\[#((?:\[#.*?#]|(?!#])[^|])*?)\|((?:\[#.*?#]|(?!#])[^|])*?)#]

用法

See code in use here

import java.util.*;
import java.lang.*;
import java.io.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Ideone
{
    public static void main (String[] args) throws java.lang.Exception
    {
        String s = "Today [#[#item#] was|the items were#] shipped so [#it is|they are#] gone.";
        System.out.println(convert(s, true));
        System.out.println(convert(s, false));
    }

    public static String convert(String text, boolean isSingular) {
        Pattern spPattern = Pattern.compile("\[#((?:\[#.*?#]|(?!#])[^|])*?)\|((?:\[#.*?#]|(?!#])[^|])*?)#]");
        Matcher matcher = spPattern.matcher(text);
        while (matcher.find()) {
            String replacement = isSingular ? matcher.group(1) : matcher.group(2);
            text = matcher.replaceFirst(replacement);
            matcher = spPattern.matcher(text);
        }
        return text;
    }
}

说明

\[# 按字面意思匹配 [#
((?:\[#.*?#]|(?!#])[^|])*?) 将以下内容捕获到捕获组 1
- (?:\[#.*?#]|(?!#])[^|])*? 匹配任意次数，但越少越好
- \[#.*?#]匹配以下
  - \[# 按字面意思匹配 [#
  - .*? 匹配任何字符任意次数，但尽可能少
  - #]字面匹配
- (?!#])[^|]匹配以下
  - (?!#]) 否定前瞻确保后面的内容不匹配 #] 字面意思
  - [^|] 匹配除 |
\| 按字面匹配 |
((?:\[#.*?#]|(?!#])[^|])*?) 将以下内容捕获到捕获组 2
- 请参阅捕获组 1 下的说明（这与捕获组 1 相同）
#] 按字面意思匹配

Answer 2

这是伪代码，展示了实现此目的的方法。
显然我不知道Java.

regex_main = "(?s)(.*?)((?=\[\#)(?:(?=.*?\[\#(?!.*?\3)(.*\#\](?!.*\4).*))(?=.*?\#\](?!.*?\4)(.*)).)+?.*?(?=\3)(?:(?!\[\#).)*)(?=\4)|(.+)"

regex_brack_contents = "(?s)^\[\#(.*)\#\]$"

sTemplate = "Today [#[#item#] was|the items were#] shipped so [#it is|they are#] gone."
sOut[5] = ""
nPermutations = 0
Matcher _M = regex_main.matcher( sTemplate );

while ( _M.find() ) {
    if ( _M.group(1) ) {
        for (i = 0; i < 5; i++ ) 
            sOut[i] += _M.group(1)
        Matcher _m = regex_brack_contents.matcher( _M.group(2) )
        if ( _m.find() ) {
            aray = _m.group(1).split("|");
            for ( i = 0; i < sizeof(aray), i < 5; i++ )
                sOut[i] += aray[i]
            if ( i > nPermutations )
                nPermutations = i
        }
    }
    else { 
        for (i = 0; i < 5; i++ ) 
           sOut[i] += _M.group(5)
    }
    _M = regex_main.matcher( sTemplate );
}
for (i = 0; i < nPermutations; i++ ) 
    print( sOut[i] + "\r\n" )

匹配所有接受特定字符的正则表达式中的负前瞻

Negative lookahead in regex which matches everything accept a specific character

java

regex

negative-lookahead

简介

代码

用法

说明