从pdf文件中提取的文本中提取带小数点的数字
extracting number with decimal points from text extracted from pdf files
我只需要从以下字符串中提取带小数点的数字。我使用了 re 模块,但遇到了多个逗号的问题(不能有逗号或超过 1 个)。另一个问题是小数后跟单词(即 1,513,971.63Savings )。因为我从 PDF 文件中提取了字符串,所以我无法更改格式。
示例字符串:
Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy
输出:
19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15
有人帮忙吗?
我猜你错过了 174,381.98。如果是这样,请使用 (\d+(?:[,.]\d+)+)
模式来获得预期结果。
import re
string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""
print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")
我只需要从以下字符串中提取带小数点的数字。我使用了 re 模块,但遇到了多个逗号的问题(不能有逗号或超过 1 个)。另一个问题是小数后跟单词(即 1,513,971.63Savings )。因为我从 PDF 文件中提取了字符串,所以我无法更改格式。
示例字符串:
Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy
输出:
19,858,700.86
350,745,799.38
174,381.98
1,125,990.66
131,647.15
有人帮忙吗?
我猜你错过了 174,381.98。如果是这样,请使用 (\d+(?:[,.]\d+)+)
模式来获得预期结果。
import re
string = """Date: 01-Mar-2022BETKA Br (0225)LIABILITIESCUSTOMER DEPOSITS 19,858,700.86Current Deposit12102010010165 350,745,799.38Saving Deposits12102010050170 174,381.98SB Bidhaba Bhata12102010060171 1,125,990.66SB Bayaska Bhata12102010070172 131,647.15SB Pratibandhy"""
print(*re.findall(r"(\d+(?:[,.]\d+)+)", string), sep="\n")