匹配整数文字的正则表达式

Regular expression to match integer literal

我正在考虑解析一个整数列表(来自 属性 字符串)。但是,我想超越正负十进制值并解析 any 表示 a Java integer literal (JLS 17) as can be found in source code. Similarly, I would like to be lenient with regards to any prefixes, separators and appendices around the integers themselves. In other words, I want to find them using repeated calls to Matcher.find().

的字符串

是否有匹配所有可能的 Java 整数文字的正则表达式?它不需要检查上限和下限。


即使我明确地 link JLS,我也会显示一些有效和无效的数字:

嗯....用最简单的话来说,基数为 2、8 和 10 的数字可以使用相同的模式,因为它们的值都是数字字符。但是,您可能想要每种类型的表达式。问题是你没有表达清楚你的意图。我假设您希望表达式验证特定值的基础。

String base10Regex = "[0-9]+";
String base2Regex = "[0-1]+";
String base8Regex = "[0-7]+";
String base16Regex = "^[0-9A-F]+$";

对于八进制和十进制值,您需要在表达式前添加一个可选的符号字符 "^[\+|-]?"。对于十六进制值,如果您希望这些值以“0x”开头,我建议在表达式前加上这些文字值。

类似的东西:

十进制:
(?:0|[1-9](?:_*[0-9])*)[lL]?

十六进制:
0x[a-fA-F0-9](?:_*[a-fA-F0-9])*[lL]?

八进制:
0[0-7](?:_*[0-7])*[lL]?

二进制:
0[bB][01](?:_*[01])*[lL]?

全部:(在自由空间模式下)

(?:
    0
    (?:
        x [a-fA-F0-9] (?: _* [a-fA-F0-9] )*
      |
        [0-7] (?: _* [0-7] )*
      |
        [bB] [01] (?: _* [01] )*
    )?
  |
    [1-9] (?: _* [0-9] )*
)
[lL]?

test it

在 Casimir 的回答之后,我决定更进一步,并实现了一些代码来实际解析整数,包括在下面。它确实包括减号和加号,即使它们正式不是 JLS 中描述的整数文字的一部分;他们是一元运算符。

package nl.owlstead.ifprops;

import java.math.BigInteger;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public final class JavaIntegerParser {
    private static final Pattern BINARY = Pattern.compile("(0b)([01](?:_*[01])*)(L?)", Pattern.CASE_INSENSITIVE);
    private static final Pattern OCTAL = Pattern.compile("(0)([0-7](?:_*[0-7])*)(L?)", Pattern.CASE_INSENSITIVE);
    private static final Pattern DECIMAL = Pattern.compile("()(0|(?:[1-9](?:_*[0-9])*))(L?)", Pattern.CASE_INSENSITIVE);
    private static final Pattern HEXADECIMAL = Pattern.compile("(0x)([0-9a-f](?:_*[0-9a-f])*)(L?)", Pattern.CASE_INSENSITIVE);
   
    // NOTE: OCTAL should be before DECIMAL if this is used to find the pattern
    private static final Pattern SIGNED_INTEGER_LITERAL = Pattern.compile(
            "(?:([+-])\s*)?(" + 
            BINARY + "|" + OCTAL + "|" + DECIMAL + "|" + HEXADECIMAL + 
            ")", Pattern.CASE_INSENSITIVE);
        
    public static int parseJavaInteger(String javaInteger) throws NumberFormatException {
        BigInteger value = parseIntegerAsBigInt(javaInteger);
        try {
            return value.intValueExact();
        } catch (@SuppressWarnings("unused") ArithmeticException e) {
            throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
        }
    }
    
    public static long parseJavaLong(String javaLong) throws NumberFormatException {
        BigInteger value = parseIntegerAsBigInt(javaLong);
        try {
            return value.longValueExact();
        } catch (@SuppressWarnings("unused") ArithmeticException e) {
            throw new NumberFormatException("Number is not between Integer.MIN_VALUE and Integer.MAX_VALUE");
        }
    }

    private static BigInteger parseIntegerAsBigInt(String javaLiteral) {
        Matcher intMatcher = SIGNED_INTEGER_LITERAL.matcher(javaLiteral);
        if (!intMatcher.matches()) {
            throw new NumberFormatException(javaLiteral + " is not recognized as a Java integer literal");
        }
        
        String signGroup = intMatcher.group(1);
        String prefixAndValueGroup = intMatcher.group(2);
        String radixGroup = "";
        String valueGroup = "";
        // String longGroup = "";
        List<Pattern> patterns = List.of(BINARY, OCTAL, DECIMAL, HEXADECIMAL);
        for (Pattern pattern : patterns) {
            Matcher specificMatcher = pattern.matcher(prefixAndValueGroup);
            if (specificMatcher.matches()) {
                radixGroup = specificMatcher.group(1);
                valueGroup = specificMatcher.group(2);
                // longGroup = specificMatcher.group(3);
                break;
            }
        }
        
        if (valueGroup == null) {
            throw new RuntimeException("Number both matches but doesn't contain a value (parser error)");
        }

        BigInteger sign = signGroup != null && signGroup.matches("-") ? BigInteger.ONE.negate() : BigInteger.ONE; 
        
        int radix;
        switch (radixGroup.toLowerCase()) {
        case "0b":
            radix = 2;
            break;
        case "0":
            radix = 8;
            break;
        case "":
            radix = 10;
            break;
        case "0x":
            radix = 16;
            break;
        default:
            throw new RuntimeException();
        }
 
        BigInteger value = new BigInteger(valueGroup.replaceAll("_", ""), radix).multiply(sign);
        return value;
    }
}

我还尝试使用代码从一个字符串中查找多个整数,但效果不佳。问题是一些无效的文字,例如 0__0 被接受为两个值为零的文字;不完全是你想要的。因此,请仅使用正则表达式来检测字符串是否实际上是整数并将整数分开,例如使用 String.split(SEPARATOR_REGEX).

有趣的是,我的 Eclipse IDE 确实接受 0__0 作为文字,即使它正式不符合 JLS。不大,但很奇怪 none-the-less.