正则表达式 Python 替换情侣名字

Regular Expressions Python replacing couple names

我想查找“John and Jane Doe”之类的表达并将其替换为“John Doe and Jane Doe”

示例表达式

regextest = 'Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown'

我可以找到表达式并将其替换为固定字符串,但我无法通过修改原始文本来替换它。

re.sub(r'[a-zA-Z]+\s*and\s*[a-zA-Z]+.[^,]*',"kittens" ,regextest)

Output: 'Heather Robinson, kittens, kittens, Jimmy Nichols, Melanie Carbone, and Nancy Brown'

我认为我们可以传递一个可以进行更改的函数,而不是字符串 ("kittens"),但我无法编写该函数。我收到以下错误。

def re_couple_name_and(m):
    return f'*{m.group(0).split()[0]+m.group(0).split()[-1:]+ m.group(0).split()[1:]}'
    
re.sub(r'[a-zA-Z]+\s*and\s*[a-zA-Z]+.[^,]*',re_couple_name_and ,regextest)

IIUC,一种使用捕获组的方法:

def re_couple_name_and(m):
    family_name = m.group(3).split(" ",1)[1]
    return "%s %s" % (m.group(1), family_name) + m.group(2) + m.group(3) 

re.sub(r'([a-zA-Z]+)(\s*and\s*)([a-zA-Z]+.[^,]*)',re_couple_name_and ,regextest)

输出:

'Heather Robinson, Jane Smith and John Smith, Kiwan Brady John and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown'

您可以使用下面的正则表达式捕获要互换的项目并使用re.sub()构造新字符串。

(\w+)( +and +)(\w+)( +[^,]*)

Demo

例子

import re
text="Heather Robinson, Jane and John Smith, Kiwan and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown"
print(re.sub(r"(\w+)( +and +)(\w+)( +[^,]*)",r"",text))

输出

Heather Robinson, Jane Smith and John Smith, Kiwan Brady John and Nichols Brady John, Jimmy Nichols, Melanie Carbone, and Nancy Brown