获取方括号内的文本以及正则表达式 java 中的分隔符?

Get text inside brackets along with splitting delimiters in regex java?

我有一个多行字符串,由一组不同的分隔符分隔,

A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H

我需要用定界符拆分该字符串,但如果某些词在括号内,则将括号提取为单个词,即使它包含定界符。我需要将它们提取如下,

A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H

目前我正在使用这个表达式按分隔符分割,

(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))

我尝试了以下方法,但没有用。那么我怎样才能让它发挥作用呢?

((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))

Java代码,

String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");

for (String word : texts) {
    System.out.println(word);
}

IMO,匹配比拆分更容易

由于还需要“分隔符”,所以我建议改为匹配我们需要的模式。根据给出的例子,我们有以下模式可以捕捉。

  1. (C DelimiterA D) - 括号包含一个词、定界符和一个词
    这是 "\(\w+ (DelimiterA|DelimiterB) \w+\)"
  2. DelimiterB - 整个分隔符。
    "(DelimiterA|DelimiterB)".
  3. B, B X - 一个或多个不是分隔符的词。
    如何检查单词不是分隔符?
    我们可以通过定界符(check Regex not operator)检查中间的" "不是followed/preceded,而是"\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*".
import java.util.Scanner;

public class SplitWithCustomDelimiter {
    public static void main(String[] args) {
        String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
        // scanner can accept different source
        Scanner scanner = new Scanner(txt);
        scanner.findAll(
                "\(\w+ (DelimiterA|DelimiterB) \w+\)" +
                "|(DelimiterA|DelimiterB)" +
                "|\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
                )
                .map(matchResult -> matchResult.group()).forEach(System.out::println);
    }
}