使用正则表达式提取匹配项两侧的字符

Question

我有一个字符串：

test=' 40  virtual asset service providers law, 2020e section 1  c law 14 of 2020   page 5  cayman islands'

我想匹配所有出现的数字，然后不仅打印数字，还打印数字两边的三个字符。

目前，使用 re 我已经匹配了数字：

print (re.findall('\d+', test ))
['40', '2020', '1', '14', '2020', '5']

我想要 return:

[' 40  v', 'w, 2020e s', 'aw 14 of', 'of 2020   ', 'ge 5  c']

Answer 1

使用.捕捉任意字符，然后{0,3}每边最多捕捉3个字符

print(re.findall('.{0,3}\d+.{0,3}', test))

Answer 2

给你：

re.findall('[^0-9]{0,3}[0-9]+[^0-9]{0,3}', test)

[编辑]
打破模式：
'[^0-9]{0,3}' 最多匹配 3 个 non-digit 个字符
'[0-9]+' 匹配一位或多位数字

最终模式 '[^0-9]{0,3}[0-9]+[^0-9]{0,3}' 匹配一个或多个数字，每边最多被 3 non-digit 包围。

为了减少混淆，我赞成在模式中使用 '[^0-9]{0,3}' 而不是 '.{0,3}'（如其他答案中所述），因为它明确说明 non-digit 需要要匹配。 '.' 可能会造成混淆，因为它匹配任何文字（包括数字）。

Answer 3

re.findall(".{0,3}\d+.{0,3}", test)

{0,3}“贪心”量词最多匹配 3 个字符。

Using regex to extract characters either side of a match