从pdf文件中提取的文本中提取带小数点的数字

extracting number with decimal points from text extracted from pdf files

我只需要从以下字符串中提取带小数点的数字。我使用了 re 模块,但遇到了多个逗号的问题(不能有逗号或超过 1 个)。另一个问题是小数后跟单词(即 1,513,971.63Savings )。因为我从 PDF 文件中提取了字符串,所以我无法更改格式。

示例字符串:

Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy

输出:

19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15

有人帮忙吗?

我猜你错过了 174,381.98。如果是这样,请使用 (\d+(?:[,.]\d+)+) 模式来获得预期结果。

import re

string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""

print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")