Java 用于提取字符串部分的正则表达式

Question

考虑具有以下格式的长字符串（括号不是实际文本的一部分，只是在此处添加以显示组限制）：

(text, excluding the '=' character)(space)(ab = c d)(space)(e = f)(space)(g = h i):(space)(other text)

如何使用单个 Java 正则表达式将以上内容分成以下 3 组？

text, excluding the equals character
ab = c d e = f g = h i
 other text

第一组是随机文本（没有任何'='字符），第二组是（可能很长）一系列键值对，其中没有':'字符，至少值可以有中间有空格，第三组是随机文本的另一部分。第二组与第三组之间用“:”字符分隔。

以下正则表达式 "almost" 有效：

([^=]+)([^:]+):(.*)

但它产生的组是：

text, excluding the equals character ab
= c d e = f g = h i
other text

有没有办法 "back-reference" 第一组的最后一部分（即 "ab" 字符串），以便它包含在第二组而不是第一组中？

Answer 1

以下应使用正则表达式拆分字符串：

import java.util.regex.Matcher;
import java.util.regex.Pattern;


public class RegexMatching {

    public static void main(String[] args) {
         Pattern p = Pattern.compile("([^=]+) ([^=]+ = [^:]+): (.+)");
         Matcher m = p.matcher("text, excluding the equals character ab = c d e = f g = h i: other text");

         if (m.find()) {
             //System.out.println(m.group(0));
             System.out.println(m.group(1));
             System.out.println(m.group(2));
             System.out.println(m.group(3));
         }
    }

}

请注意，索引为 0 的组（已注释掉）将 return 整个字符串。

关于：

Is there any way to "back-reference" the last part of the first group (i.e., the "ab" string) so that it is included in the second group instead of the first group?

使用上面的正则表达式，我们强制键值对的第一个词在第二个捕获组中。（这不是正则表达式术语上下文中的“反向引用，因为这通常意味着反向引用其中一个捕获组。）

编辑：根据问题中的编辑更新了正则表达式。 EDIT2：回答了反向引用问题。

Java 用于提取字符串部分的正则表达式

Java regular expression to extract parts of a string

java

regex

match