如何使用正则表达式从字符串的开头和结尾捕获 2 种不同的模式？

Question

我需要捕获两种不同的模式，一种是从字符串的开头，另一种是从字符串的结尾。

我正在使用 Python3。

示例 1：

string: 'TRADE ACCOUNT BALANCE FROM 2 TRADE LINES CALL. .... $ 23,700'
expected_output: TRADE ACCOUNT BALANCE 23,700
my_regex_pattern: r'(TRADE ACCOUNT BALANCE).+([\d,]+)'
output(group 0): TRADE ACCOUNT BALANCE
output(group 1): 0

示例 2：

string: 'AVERAGE BALANCE IN THE PAST 5 QUARTERS ......... $ 26,460'
output: AVERAGE BALANCE 26,460
my_regex_pattern: r'(AVERAGE BALANCE).+([\d,]+)'
output(group 0): AVERAGE BALANCE
output(group 1): 0

子串，最后，永远是一个数字。开头的子字符串总是一个单词 我不明白为什么它只捕获最后一个字符。

Answer 1

你的模式中的 .+ 将整个字符串匹配到末尾，然后回溯找到匹配 [\d,]+ 模式的第一个匹配项。由于最后一个 0 符合此条件，因此第二组中只有 0 匹配成功。

在这种情况下你需要做的是找到"anchor"第二组的起点。

在您提供的字符串中，数字前有一个美元符号。所以，你可以使用

(TRADE ACCOUNT BALANCE).*$\s*(\d[\d,]*)

参见 regex demo and the regex graph:

详情

(TRADE ACCOUNT BALANCE) - 第 1 组：文字子字符串
.* - 除换行字符外的任何 0+ 个字符，尽可能多
$ - 一个 $ 字符
\s* - 0+ 个空格
(\d[\d,]*) - 第 2 组：一个数字，然后是 0+ 个数字或逗号。

如何使用正则表达式从字符串的开头和结尾捕获 2 种不同的模式？

How to capture 2 different patterns one from the beginnig and other from the end of the string using Regular Expression?

regex

regex-group

python-3.7