使用正则表达式提取带点和逗号的数字

Question

我已经阅读了很多页面，试图向我解释如何将 regex 用于 Python，但我仍然完全不明白。即使 regex wiki and the re documentation 也帮不了我。我还是有点困惑 :P

我有以下字符串：

string = "|C195|1|Base de Cálculo ST: 2.608,24 - Valor da ST: 163,66|"

我正在尝试仅提取 2.608,24 和 163,66 使用：

st_values = re.findall("\d+[,.]\d+", string)

然而，我的print st_values的输出是：

['2.608','163,66']

相反，我希望它是

['2.608,24','163,66']

我不要

['195', '1', '2.608,24','163,66']

那么，我怎样才能使用正则表达式参数的字母汤那样提取它们呢？

Answer 1

试试这个（这个正则表达式还假定像 1,23 这样的字符串是匹配的。）-

>>> re.findall("\d+(?:\.\d+)?,\d+", string)
['2.608,24', '163,66']

Regex demo and Explanation

Answer 2

我建议：

\b\d{1,3}(?:\.\d{3})*,\d+\b

这是一个demo

这是一个 IDEONE code demo:

import re
p = re.compile(r'\b\d{1,3}(?:\.\d{3})*,\d+\b')
test_str = "|C195|1|Base de Cálculo ST: 2.608,24 - Valor da ST: 2.608.234,24 12.608.234,24\n  163,66|\nd2.608.234,24\n2.60d8.23d4,24"
print(re.findall(p, test_str))

Answer 3

如果你想从倒数第二个 column/field 中提取数字，你可以这样做：

 In: re.findall(r"[0-9,.]+",string.split('|')[-2])      
Out: ['2.608,24', '163,66']

否则，如果您只使用正则表达式，并且其他列中有相似的数字，则无法过滤掉它们。

使用正则表达式提取带点和逗号的数字

Extracting a number with dot and comma using regex

python

regex

string

extract

python-2.7