Python 匹配特定货币格式的正则表达式

Question

我正在尝试在 python 3.4 中编写一个正则表达式，它将从潜在价格的文本文件中获取输入并匹配有效格式。

要求价格为 $X.YY 或 $X 格式，其中 X 必须大于 0。

无效格式包括$0.YY、$.YY、$X.Y、$X.YYY

到目前为止，这是我所拥有的：

import re
from sys import argv

FILE = 1

file = open(argv[FILE], 'r')
string = file.read()
file.close()

price = re.compile(r"""         # beginning of string
                       ($      # dollar sign
                       [1-9]    # first digit must be non-zero
                       \d * )   # followed by 0 or more digits
                       (\.       # optional cent portion
                       \d {2}  # only 2 digits allowed for cents
                         )?     # end of string""", re.X)

valid_prices = price.findall(string)
print(valid_prices)

这是我现在用来测试的文件：

test.txt

 .23  .23  13443.23 22342 394 0.232 2.2 .03

当前输出：

$[('', '.23'), ('', ''), ('', '.23'), ('', ''), ('13443', '.23'), ('22342', ''), ('0', '.23'), ('2', '')]

当前匹配 $230.232 和 $232.2 应该拒绝这些。

我将美元部分和美分部分分成不同的组，以便稍后进行进一步处理。这就是为什么我的输出是元组列表的原因。

这里有一个问题是我不知道输入文件中将使用什么分隔符（如果有的话）。

我是正则表达式的新手，非常感谢您的帮助。谢谢！

Answer 1

添加零宽度正前瞻性 (?=\s|$) 以确保匹配后仅跟有空格或行尾：

>>> s = '.23  .23  13443.23 22342 394 0.232 2.2 .03'

>>> re.findall(r'$[1-9]\d*(?:\.\d{2})?(?=\s|$)', s)
['.23', '', '.23', '', '13443.23', '22342']

Answer 2

如果真的不清楚，将使用哪个分隔符，对我来说，检查 "not a digit and not a dot" 作为分隔符才有意义：

$[1-9]\d*(\.\d\d)?(?![\d.])

https://regex101.com/r/jH2dN5/1

Answer 3

试试这个

$(?!0\d)\d+(?:\.\d{2})?(?=\s|$)

Regex demo

匹配项：

.23  .23  13443.23 22342 [=11=].99 .00

Python 匹配特定货币格式的正则表达式

Python Regular Expression to match specific currency format

python

regex

currency

file