正则表达式区分千位分隔数与非千位分隔数

Question

我需要在给定的文本行中提取价格信息。到目前为止，我在 java 中使用下面的正则表达式 (\d{1,3}(,\d{3})*(\.\d+)?) 和 price will be 90,500 USD

这样的行是成功的

但是，现在我也有价格开始前有另一个号码的行 (eg: for order number 12345 the price will be 100,500 USD)。在这种情况下，我的价格提取失败了。例如，上面会给我 123 作为结果。

我能否有一个 regex/another 方法来只提取价格信息而不管其他数字是否存在？（价格将始终以千位分隔，有或没有小数点）

下面是我现在用于这项工作的完整代码：

private String getPrice(String fileText) {
    String lines[] = fileText.split(System.lineSeparator());

    for (String line : lines) {
        Pattern p = Pattern.compile("(\d{1,3}(,\d{3})*(\.\d+))");
        Matcher m = p.matcher(line);
        if (m.find()) {
            return m.group(0);
        }

        p = Pattern.compile("(\d{1,3}(,\d{3})*(\.\d+)?)");
        m = p.matcher(line);
        if (m.find()) {
            return m.group(0);
        }   
    }       
    return "";
}

我希望匹配是单词级别的。 (eg: 123 of 12345 should not match.) 我的单词分隔符只有 space。 123-456 被认为是一个单词。因此，在 123456、123-456、123,456、123,456.56、A123456 中，只有 123,456、123,456.56 应该匹配。问题是我当前的代码提取 123 of 123456、123-456 和 A123456

Answer 1

您的正则表达式与任何上下文中的数字匹配，小数部分是必需的。

我建议：

只匹配不包含单词字符的数字
在分数部分模式周围使用可选的 non-capturing 组。

使用

Pattern p = Pattern.compile("\b\d{1,3}(?:,\d{3})*(?:\.\d+)?\b");

参见regex demo。

\b 模式是单词边界，(?:\.\d+)? 中的 (?:...)? 是一个 non-capturing 组，重复一次或零次，即可选。

正则表达式区分千位分隔数与非千位分隔数

Regex to differentiate thousand separated number vs not thousand separated number

java

regex

text

text-processing