Java 正则表达式分词

Question

这里是正则表达式的新手哈哈。

假设我有一个字符串：

String toMatch = "TargetCompID=NFSC_AMD_Q\n" +
        
            "\n## Bin's verifix details";

在 .cfg 文件中显示为：

TargetCompID=NFSC_AMD_Q

## Bin's verifix details

我想将其标记为一个数组：

{"TargetCompID", "NFSC_AMD_Q", "## Bin's verifix details"}

当前代码，但什么都没有:

static void regexTest(String regex, String toMatch) {
    Pattern patternTest = Pattern.compile(regex);
    Matcher matcherTest = patternTest.matcher(toMatch);
    while (matcherTest.find()) {
        for (int i = 1; i <= matcherTest.groupCount(); i++) {
            System.out.println(matcherTest.group(i));
        }
    }
}

public static void main(String[] args) throws Exception {
    String regex = "^[^=]+.*$" + "|" + "^#+.*$";
    String toMatch = "TargetCompID=NFSC_AMD_Q\n" +
            "\n" +
            "## Bin's verifix details";


    String testRegex = ".*";
    String testToMatch = "   ###  Bin";
    regexTest(regex1, toMatch);
    System.out.println("----------------------------");

// regexTest(testRegex, testToMatch);

编辑

while (matcherTest.find()) {
    for (int i = 1; i < matcherTest.groupCount(); i++) {
        System.out.println(matcherTest.group(i));
    }

打印：

TargetCompID
NFSC_AMD_Q

但不是

## Bin's verifix details

为什么？

还有这个代码：

while (matcherTest.find()) {
    System.out.println(matcherTest.group());
}

只打印

TargetCompID=NFSC_AMD_Q

## Bin's verifix details

TargetCompID 和 NSFC_AMD_Q 不分开是因为我们不做 group(i) 吗？为什么要打印 \newline?

Answer 1

您可以在 Java:

中使用此正则表达式

(?m)^([^=]+)=(.+)\R+^(#.*)

RegEx Demo

正则表达式分解：

(?m)：启用MULTILINE模式
^([^=]+)=：匹配到 = 并在第 1 组中捕获，然后是 =
(.+)：匹配组#2
\R+：匹配1+个换行符
^(#.*)：匹配第 3 组

#

Java 正则表达式分词

Java Regex Tokenizing

java

regex

token

delimiter

编辑