获取方括号内的文本以及正则表达式 java 中的分隔符?
Get text inside brackets along with splitting delimiters in regex java?
我有一个多行字符串,由一组不同的分隔符分隔,
A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H
我需要用定界符拆分该字符串,但如果某些词在括号内,则将括号提取为单个词,即使它包含定界符。我需要将它们提取如下,
A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H
目前我正在使用这个表达式按分隔符分割,
(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))
我尝试了以下方法,但没有用。那么我怎样才能让它发挥作用呢?
((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))
Java代码,
String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");
for (String word : texts) {
System.out.println(word);
}
IMO,匹配比拆分更容易
由于还需要“分隔符”,所以我建议改为匹配我们需要的模式。根据给出的例子,我们有以下模式可以捕捉。
(C DelimiterA D)
- 括号包含一个词、定界符和一个词
这是 "\(\w+ (DelimiterA|DelimiterB) \w+\)"
DelimiterB
- 整个分隔符。
即 "(DelimiterA|DelimiterB)"
.
B
, B X
- 一个或多个不是分隔符的词。
如何检查单词不是分隔符?
我们可以通过定界符(check Regex not operator)检查中间的" "不是followed/preceded,而是"\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
.
import java.util.Scanner;
public class SplitWithCustomDelimiter {
public static void main(String[] args) {
String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
// scanner can accept different source
Scanner scanner = new Scanner(txt);
scanner.findAll(
"\(\w+ (DelimiterA|DelimiterB) \w+\)" +
"|(DelimiterA|DelimiterB)" +
"|\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
)
.map(matchResult -> matchResult.group()).forEach(System.out::println);
}
}
我有一个多行字符串,由一组不同的分隔符分隔,
A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H
我需要用定界符拆分该字符串,但如果某些词在括号内,则将括号提取为单个词,即使它包含定界符。我需要将它们提取如下,
A Z
DelimiterB
B X
DelimiterA
(C DelimiterA D) (extract with brackets)
DelimiterB
(E DelimiterA F)
DelimiterB
G
DelimiterA
H
目前我正在使用这个表达式按分隔符分割,
(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB)))
我尝试了以下方法,但没有用。那么我怎样才能让它发挥作用呢?
((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))
Java代码,
String txt = "A DelimiterB B DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
String[] texts = txt.split("((?=\()|(?<=\))|(((?<=DelimiterA)|(?=DelimiterA))|((?<=DelimiterB)|(?=DelimiterB))))");
for (String word : texts) {
System.out.println(word);
}
IMO,匹配比拆分更容易
由于还需要“分隔符”,所以我建议改为匹配我们需要的模式。根据给出的例子,我们有以下模式可以捕捉。
(C DelimiterA D)
- 括号包含一个词、定界符和一个词
这是"\(\w+ (DelimiterA|DelimiterB) \w+\)"
DelimiterB
- 整个分隔符。
即"(DelimiterA|DelimiterB)"
.B
,B X
- 一个或多个不是分隔符的词。
如何检查单词不是分隔符?
我们可以通过定界符(check Regex not operator)检查中间的" "不是followed/preceded,而是"\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
.
import java.util.Scanner;
public class SplitWithCustomDelimiter {
public static void main(String[] args) {
String txt = "A Z DelimiterB B X DelimiterA (C DelimiterA D) DelimiterB (E DelimiterA F) DelimiterB G DelimiterA H";
// scanner can accept different source
Scanner scanner = new Scanner(txt);
scanner.findAll(
"\(\w+ (DelimiterA|DelimiterB) \w+\)" +
"|(DelimiterA|DelimiterB)" +
"|\w+((?<!(DelimiterA|DelimiterB))\s(?!(DelimiterA|DelimiterB))\w+)*"
)
.map(matchResult -> matchResult.group()).forEach(System.out::println);
}
}