匹配整数文字的正则表达式
Regular expression to match integer literal
我正在考虑解析一个整数列表(来自 属性 字符串)。但是,我想超越正负十进制值并解析 any 表示 a Java integer literal (JLS 17) as can be found in source code. Similarly, I would like to be lenient with regards to any prefixes, separators and appendices around the integers themselves. In other words, I want to find them using repeated calls to Matcher.find()
.
的字符串
是否有匹配所有可能的 Java 整数文字的正则表达式?它不需要检查上限和下限。
即使我明确地 link JLS,我也会显示一些有效和无效的数字:
-1
:匹配1
,但减号是一元运算符(必要时我会调整)
0x00_00_00_0F
:值十五匹配为十六进制数字,下划线分隔两个半字节
0b0000_1111
:匹配二进制值十五
017
:匹配到八进制值十五
嗯....用最简单的话来说,基数为 2、8 和 10 的数字可以使用相同的模式,因为它们的值都是数字字符。但是,您可能想要每种类型的表达式。问题是你没有表达清楚你的意图。我假设您希望表达式验证特定值的基础。
String base10Regex = "[0-9]+";
String base2Regex = "[0-1]+";
String base8Regex = "[0-7]+";
String base16Regex = "^[0-9A-F]+$";
对于八进制和十进制值,您需要在表达式前添加一个可选的符号字符 "^[\+|-]?"
。对于十六进制值,如果您希望这些值以“0x”开头,我建议在表达式前加上这些文字值。
类似的东西:
十进制:
(?:0|[1-9](?:_*[0-9])*)[lL]?
十六进制:
0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?
八进制:
0[0-7](?:_*[0-7])*[lL]?
二进制:
0[bB][01](?:_*[01])*[lL]?
全部:(在自由空间模式下)
(?:
0
(?:
x [a-fA-F0-9] (?: _* [a-fA-F0-9] )*
|
[0-7] (?: _* [0-7] )*
|
[bB] [01] (?: _* [01] )*
)?
|
[1-9] (?: _* [0-9] )*
)
[lL]?
在 Casimir 的回答之后,我决定更进一步,并实现了一些代码来实际解析整数,包括在下面。它确实包括减号和加号,即使它们正式不是 JLS 中描述的整数文字的一部分;他们是一元运算符。
package nl.owlstead.ifprops;
import java.math.BigInteger;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public final class JavaIntegerParser {
private static final Pattern BINARY = Pattern.compile("(0b)([01](?:_*[01])*)(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern OCTAL = Pattern.compile("(0)([0-7](?:_*[0-7])*)(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern DECIMAL = Pattern.compile("()(0|(?:[1-9](?:_*[0-9])*))(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern HEXADECIMAL = Pattern.compile("(0x)([0-9a-f](?:_*[0-9a-f])*)(L?)", Pattern.CASE_INSENSITIVE);
// NOTE: OCTAL should be before DECIMAL if this is used to find the pattern
private static final Pattern SIGNED_INTEGER_LITERAL = Pattern.compile(
"(?:([+-])\s*)?(" +
BINARY + "|" + OCTAL + "|" + DECIMAL + "|" + HEXADECIMAL +
")", Pattern.CASE_INSENSITIVE);
public static int parseJavaInteger(String javaInteger) throws NumberFormatException {
BigInteger value = parseIntegerAsBigInt(javaInteger);
try {
return value.intValueExact();
} catch (@SuppressWarnings("unused") ArithmeticException e) {
throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
}
}
public static long parseJavaLong(String javaLong) throws NumberFormatException {
BigInteger value = parseIntegerAsBigInt(javaLong);
try {
return value.longValueExact();
} catch (@SuppressWarnings("unused") ArithmeticException e) {
throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
}
}
private static BigInteger parseIntegerAsBigInt(String javaLiteral) {
Matcher intMatcher = SIGNED_INTEGER_LITERAL.matcher(javaLiteral);
if (!intMatcher.matches()) {
throw new NumberFormatException(javaLiteral + " is not recognized as a Java integer literal");
}
String signGroup = intMatcher.group(1);
String prefixAndValueGroup = intMatcher.group(2);
String radixGroup = "";
String valueGroup = "";
// String longGroup = "";
List<Pattern> patterns = List.of(BINARY, OCTAL, DECIMAL, HEXADECIMAL);
for (Pattern pattern : patterns) {
Matcher specificMatcher = pattern.matcher(prefixAndValueGroup);
if (specificMatcher.matches()) {
radixGroup = specificMatcher.group(1);
valueGroup = specificMatcher.group(2);
// longGroup = specificMatcher.group(3);
break;
}
}
if (valueGroup == null) {
throw new RuntimeException("Number both matches but doesn't contain a value (parser error)");
}
BigInteger sign = signGroup != null && signGroup.matches("-") ? BigInteger.ONE.negate() : BigInteger.ONE;
int radix;
switch (radixGroup.toLowerCase()) {
case "0b":
radix = 2;
break;
case "0":
radix = 8;
break;
case "":
radix = 10;
break;
case "0x":
radix = 16;
break;
default:
throw new RuntimeException();
}
BigInteger value = new BigInteger(valueGroup.replaceAll("_", ""), radix).multiply(sign);
return value;
}
}
我还尝试使用代码从一个字符串中查找多个整数,但效果不佳。问题是一些无效的文字,例如 0__0
被接受为两个值为零的文字;不完全是你想要的。因此,请仅使用正则表达式来检测字符串是否实际上是整数并将整数分开,例如使用 String.split(SEPARATOR_REGEX)
.
有趣的是,我的 Eclipse IDE 确实接受 0__0
作为文字,即使它正式不符合 JLS。不大,但很奇怪 none-the-less.
我正在考虑解析一个整数列表(来自 属性 字符串)。但是,我想超越正负十进制值并解析 any 表示 a Java integer literal (JLS 17) as can be found in source code. Similarly, I would like to be lenient with regards to any prefixes, separators and appendices around the integers themselves. In other words, I want to find them using repeated calls to Matcher.find()
.
是否有匹配所有可能的 Java 整数文字的正则表达式?它不需要检查上限和下限。
即使我明确地 link JLS,我也会显示一些有效和无效的数字:
-1
:匹配1
,但减号是一元运算符(必要时我会调整)0x00_00_00_0F
:值十五匹配为十六进制数字,下划线分隔两个半字节0b0000_1111
:匹配二进制值十五017
:匹配到八进制值十五
嗯....用最简单的话来说,基数为 2、8 和 10 的数字可以使用相同的模式,因为它们的值都是数字字符。但是,您可能想要每种类型的表达式。问题是你没有表达清楚你的意图。我假设您希望表达式验证特定值的基础。
String base10Regex = "[0-9]+";
String base2Regex = "[0-1]+";
String base8Regex = "[0-7]+";
String base16Regex = "^[0-9A-F]+$";
对于八进制和十进制值,您需要在表达式前添加一个可选的符号字符 "^[\+|-]?"
。对于十六进制值,如果您希望这些值以“0x”开头,我建议在表达式前加上这些文字值。
类似的东西:
十进制:
(?:0|[1-9](?:_*[0-9])*)[lL]?
十六进制:
0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?
八进制:
0[0-7](?:_*[0-7])*[lL]?
二进制:
0[bB][01](?:_*[01])*[lL]?
全部:(在自由空间模式下)
(?:
0
(?:
x [a-fA-F0-9] (?: _* [a-fA-F0-9] )*
|
[0-7] (?: _* [0-7] )*
|
[bB] [01] (?: _* [01] )*
)?
|
[1-9] (?: _* [0-9] )*
)
[lL]?
在 Casimir 的回答之后,我决定更进一步,并实现了一些代码来实际解析整数,包括在下面。它确实包括减号和加号,即使它们正式不是 JLS 中描述的整数文字的一部分;他们是一元运算符。
package nl.owlstead.ifprops;
import java.math.BigInteger;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public final class JavaIntegerParser {
private static final Pattern BINARY = Pattern.compile("(0b)([01](?:_*[01])*)(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern OCTAL = Pattern.compile("(0)([0-7](?:_*[0-7])*)(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern DECIMAL = Pattern.compile("()(0|(?:[1-9](?:_*[0-9])*))(L?)", Pattern.CASE_INSENSITIVE);
private static final Pattern HEXADECIMAL = Pattern.compile("(0x)([0-9a-f](?:_*[0-9a-f])*)(L?)", Pattern.CASE_INSENSITIVE);
// NOTE: OCTAL should be before DECIMAL if this is used to find the pattern
private static final Pattern SIGNED_INTEGER_LITERAL = Pattern.compile(
"(?:([+-])\s*)?(" +
BINARY + "|" + OCTAL + "|" + DECIMAL + "|" + HEXADECIMAL +
")", Pattern.CASE_INSENSITIVE);
public static int parseJavaInteger(String javaInteger) throws NumberFormatException {
BigInteger value = parseIntegerAsBigInt(javaInteger);
try {
return value.intValueExact();
} catch (@SuppressWarnings("unused") ArithmeticException e) {
throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
}
}
public static long parseJavaLong(String javaLong) throws NumberFormatException {
BigInteger value = parseIntegerAsBigInt(javaLong);
try {
return value.longValueExact();
} catch (@SuppressWarnings("unused") ArithmeticException e) {
throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
}
}
private static BigInteger parseIntegerAsBigInt(String javaLiteral) {
Matcher intMatcher = SIGNED_INTEGER_LITERAL.matcher(javaLiteral);
if (!intMatcher.matches()) {
throw new NumberFormatException(javaLiteral + " is not recognized as a Java integer literal");
}
String signGroup = intMatcher.group(1);
String prefixAndValueGroup = intMatcher.group(2);
String radixGroup = "";
String valueGroup = "";
// String longGroup = "";
List<Pattern> patterns = List.of(BINARY, OCTAL, DECIMAL, HEXADECIMAL);
for (Pattern pattern : patterns) {
Matcher specificMatcher = pattern.matcher(prefixAndValueGroup);
if (specificMatcher.matches()) {
radixGroup = specificMatcher.group(1);
valueGroup = specificMatcher.group(2);
// longGroup = specificMatcher.group(3);
break;
}
}
if (valueGroup == null) {
throw new RuntimeException("Number both matches but doesn't contain a value (parser error)");
}
BigInteger sign = signGroup != null && signGroup.matches("-") ? BigInteger.ONE.negate() : BigInteger.ONE;
int radix;
switch (radixGroup.toLowerCase()) {
case "0b":
radix = 2;
break;
case "0":
radix = 8;
break;
case "":
radix = 10;
break;
case "0x":
radix = 16;
break;
default:
throw new RuntimeException();
}
BigInteger value = new BigInteger(valueGroup.replaceAll("_", ""), radix).multiply(sign);
return value;
}
}
我还尝试使用代码从一个字符串中查找多个整数,但效果不佳。问题是一些无效的文字,例如 0__0
被接受为两个值为零的文字;不完全是你想要的。因此,请仅使用正则表达式来检测字符串是否实际上是整数并将整数分开,例如使用 String.split(SEPARATOR_REGEX)
.
有趣的是,我的 Eclipse IDE 确实接受 0__0
作为文字,即使它正式不符合 JLS。不大,但很奇怪 none-the-less.