使用正则表达式捕获任意多个组

Capturing Arbitrary Multiple Groups with Regex

我正在尝试使用 Java 中的正则表达式从以下字符串中捕获数据:

SettingName = "Value1",0x2,3,"Value4 contains spaces", "Value5 has a space before the string that is ignored"

字符串前面可能有任意数量的空格,可以忽略。它也可能包含比此处列出的更多或更少的值,这只是一个示例。

我的目的是捕捉这些群体:

  • 第 1 组:设置名称
  • 第 2 组:"Value1"(带引号)
  • 第 3 组:0x2
  • 第 4 组:3
  • 第 5 组:"Value4 contains spaces"
  • 第 6 组:"Value5 has a space before the string that is ignored"

  • 我尝试使用的正则表达式是:

    \s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+
    \s* -> Consume an arbitrary number of whitespace
       ( -> Start a capturing group (group 1)
        [\w\/.-] -> Get a letter of the SettingName, which may be contain alphanumberic, /, ., and -
                + -> Get the previous token one or more times (so group 1 is not blank)
                 ) -> End the capturing group
                  \s* -> Consume an arbitrary amount of whitespace
                     = -> Consume the equals sign
                      (?: -> Start an uncaptured group
                         \s* -> Consume an arbitrary amount of whitespace
                            ( -> Start a captured group
                             \"? -> Consume a quote, if it exists
                                [^\",] -> Consume any nonquote, noncomma character
                                      \"? -> Consume the end quote, if it exists
                                         ) -> End the captured group
                                          (?: -> start a uncaptured group
                                             ,|\s*$ -> capture either a comma or end of line (string?) character
                                                   ) -> end the uncaptured group
                                                    ) -> end the outer uncaptured group
                                                     + -> match the outer uncaptured group 1 or more times
    

    我正在使用此代码:

    private static final String regex = "\s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+";
    private static final Pattern settingPat = Pattern.compile(regex);
    ...
    public String text;
    public Matcher m;
    ...
    public void someMethod(String lineContents)
    {
        m = settingPat.matcher(text);
        if(!m.matches())
            ... (do other stuff)
        else
        {
            name = m.group(1);     // should be "SettingName"
            value[0] = m.group(2); // should be "\"Value1\""
            value[1] = m.group(3); // should be "0x2"
            ...
        }
    }
    

    使用这段代码,它匹配到字符串,但似乎我只捕获了最后一组。 Java and/or 正则表达式是否支持使用 + 修饰符重复任意捕获组?

    您只有 2 个捕获组,因此您不能在结果中获得超过 2 个组。您将不得不 运行 一个循环来匹配所有重复项

    您可以在 while 循环中使用此正则表达式来获取所有匹配项:

    (?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*)
    

    \G 断言位置在前一个匹配的末尾或第一个匹配的字符串的开头,因为我们使用 (?!^) 我们强制 \G 只匹配 上一场比赛结束时的位置

    RegEx Demo

    CODE DEMO

    代码:

    final String regex = "(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)";
    final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
        
    final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
    final Matcher matcher = pattern.matcher(string);
        
    while (matcher.find()) {
        if (matcher.group(1) != null)
            System.out.println(matcher.group(1));
        System.out.println("\t=> " + matcher.group(2));
    }