使用正则表达式捕获任意多个组
Capturing Arbitrary Multiple Groups with Regex
我正在尝试使用 Java 中的正则表达式从以下字符串中捕获数据:
SettingName = "Value1",0x2,3,"Value4 contains spaces", "Value5 has a space before the string that is ignored"
字符串前面可能有任意数量的空格,可以忽略。它也可能包含比此处列出的更多或更少的值,这只是一个示例。
我的目的是捕捉这些群体:
第 1 组:设置名称
第 2 组:"Value1"(带引号)
第 3 组:0x2
第 4 组:3
第 5 组:"Value4 contains spaces"
第 6 组:"Value5 has a space before the string that is ignored"
我尝试使用的正则表达式是:
\s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+
\s* -> Consume an arbitrary number of whitespace
( -> Start a capturing group (group 1)
[\w\/.-] -> Get a letter of the SettingName, which may be contain alphanumberic, /, ., and -
+ -> Get the previous token one or more times (so group 1 is not blank)
) -> End the capturing group
\s* -> Consume an arbitrary amount of whitespace
= -> Consume the equals sign
(?: -> Start an uncaptured group
\s* -> Consume an arbitrary amount of whitespace
( -> Start a captured group
\"? -> Consume a quote, if it exists
[^\",] -> Consume any nonquote, noncomma character
\"? -> Consume the end quote, if it exists
) -> End the captured group
(?: -> start a uncaptured group
,|\s*$ -> capture either a comma or end of line (string?) character
) -> end the uncaptured group
) -> end the outer uncaptured group
+ -> match the outer uncaptured group 1 or more times
我正在使用此代码:
private static final String regex = "\s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+";
private static final Pattern settingPat = Pattern.compile(regex);
...
public String text;
public Matcher m;
...
public void someMethod(String lineContents)
{
m = settingPat.matcher(text);
if(!m.matches())
... (do other stuff)
else
{
name = m.group(1); // should be "SettingName"
value[0] = m.group(2); // should be "\"Value1\""
value[1] = m.group(3); // should be "0x2"
...
}
}
使用这段代码,它匹配到字符串,但似乎我只捕获了最后一组。 Java and/or 正则表达式是否支持使用 +
修饰符重复任意捕获组?
您只有 2 个捕获组,因此您不能在结果中获得超过 2 个组。您将不得不 运行 一个循环来匹配所有重复项
您可以在 while
循环中使用此正则表达式来获取所有匹配项:
(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*)
\G
断言位置在前一个匹配的末尾或第一个匹配的字符串的开头,因为我们使用 (?!^)
我们强制 \G
只匹配 上一场比赛结束时的位置
代码:
final String regex = "(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)";
final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
if (matcher.group(1) != null)
System.out.println(matcher.group(1));
System.out.println("\t=> " + matcher.group(2));
}
我正在尝试使用 Java 中的正则表达式从以下字符串中捕获数据:
SettingName = "Value1",0x2,3,"Value4 contains spaces", "Value5 has a space before the string that is ignored"
字符串前面可能有任意数量的空格,可以忽略。它也可能包含比此处列出的更多或更少的值,这只是一个示例。
我的目的是捕捉这些群体:
我尝试使用的正则表达式是:
\s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+
\s* -> Consume an arbitrary number of whitespace
( -> Start a capturing group (group 1)
[\w\/.-] -> Get a letter of the SettingName, which may be contain alphanumberic, /, ., and -
+ -> Get the previous token one or more times (so group 1 is not blank)
) -> End the capturing group
\s* -> Consume an arbitrary amount of whitespace
= -> Consume the equals sign
(?: -> Start an uncaptured group
\s* -> Consume an arbitrary amount of whitespace
( -> Start a captured group
\"? -> Consume a quote, if it exists
[^\",] -> Consume any nonquote, noncomma character
\"? -> Consume the end quote, if it exists
) -> End the captured group
(?: -> start a uncaptured group
,|\s*$ -> capture either a comma or end of line (string?) character
) -> end the uncaptured group
) -> end the outer uncaptured group
+ -> match the outer uncaptured group 1 or more times
我正在使用此代码:
private static final String regex = "\s*([\w\/.-]+)\s*=(?:\s*(\"?[^\",]*\"?)(?:,|\s*$))+";
private static final Pattern settingPat = Pattern.compile(regex);
...
public String text;
public Matcher m;
...
public void someMethod(String lineContents)
{
m = settingPat.matcher(text);
if(!m.matches())
... (do other stuff)
else
{
name = m.group(1); // should be "SettingName"
value[0] = m.group(2); // should be "\"Value1\""
value[1] = m.group(3); // should be "0x2"
...
}
}
使用这段代码,它匹配到字符串,但似乎我只捕获了最后一组。 Java and/or 正则表达式是否支持使用 +
修饰符重复任意捕获组?
您只有 2 个捕获组,因此您不能在结果中获得超过 2 个组。您将不得不 运行 一个循环来匹配所有重复项
您可以在 while
循环中使用此正则表达式来获取所有匹配项:
(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*)
\G
断言位置在前一个匹配的末尾或第一个匹配的字符串的开头,因为我们使用 (?!^)
我们强制 \G
只匹配 上一场比赛结束时的位置
代码:
final String regex = "(?:([\w/.-]+)\h*=|(?!^)\G,)\h*((\"?)[^\",]*\3)";
final String string = "SettingName = \"Value1\",0x2,3,\"Value4 contains spaces\", \"Value5 has a space before the string that is ignored\"";
final Pattern pattern = Pattern.compile(regex, Pattern.MULTILINE);
final Matcher matcher = pattern.matcher(string);
while (matcher.find()) {
if (matcher.group(1) != null)
System.out.println(matcher.group(1));
System.out.println("\t=> " + matcher.group(2));
}