使用 python 正则表达式替换组来获取替换的字符

Question

我正在寻找一种方法来获取 re.sub() 已从字符串中替换的分组字符。所以在例如这个：

#!/usr/bin/env python3

import re
sentence="This is whatever. Foo"

# remove punctuation mark
new_sentence = re.sub('([\.,:;])', '', sentence)

removed_punctuation_mark = ??????????????

print(removed_punctuation_mark)

...如何获得删除的点？ re.subn() 只会告诉我，一个字符被删除了，但不会告诉我是哪一个。

或者换一种方式解释，在 python 中执行此 perl 脚本的操作：

#!/usr/bin/perl

$sentence = "This is whatever. Foo";

# remove punctuation mark
$sentence =~ s/([\.,:;])//;

# first group of () in regex above
$removed_punctuation_mark = ;    

print "$removed_punctuation_mark\n";

当然我可以先使用 re.search 和 group() 然后是 re.sub 但我必须重复正则表达式，不是很优雅。

Answer 1

就像@jasonharper 在他的评论中建议的那样：

import re

replacements = []


def replacement(x):
    replacements.append(x.group(1))
    return ''


sentence = 'This is whatever. Foo'
new_sentence = re.sub(r'([\.,:;])', replacement, sentence)

print(new_sentence, replacements)

这可能就是您要找的。 x 是一个匹配对象，因此它将包含所有组和有关匹配的其他信息 - 您可以从中获取任何信息，该示例获取第一组，因为这就是您的正则表达式中的标点符号。

使用 python 正则表达式替换组来获取替换的字符

Use python regex substitution with groups to get the replaced characters

python

regex

regex-group