正则表达式将单词带入带引号的子字符串

regex to bring a word inside a quoted substring

我正在开发一个函数,该函数通过识别 LPLLP 是否出现,前面是否出现 space,在 " 之后的任何位置在字符串中。如果是这种情况,我想将 LPLLP 子字符串放在带引号的子字符串中,如下所示。

# input
'blabla "RANDOM COMPANY ONE "LLP blabla'
'blabla "RANDOM COMPANY TWO " LLP blabla'
'blabla "RANDOM COMPANY THREE " LP blabla'
'blabla "RANDOM COMPANY FOUR "LP blabla'

# output
'blabla "RANDOM COMPANY ONE LLP" blabla'
'blabla "RANDOM COMPANY TWO LLP" blabla'
'blabla "RANDOM COMPANY THREE LP" blabla'
'blabla "RANDOM COMPANY FOUR LP" blabla'

到目前为止,我得到了这个功能,它几乎可以满足我的要求:

def fix_entity_broken_by_quotes(text):

    match = r'"\s*(LL?P)'
    replace = r'"  "'

    return ' '.join(re.sub(match, replace, text).split())

# run

>>> fix_entity_broken_by_quotes('blabla "RANDOM COMPANY ONE" LLP blabla')
Out[1]: 'blabla "RANDOM COMPANY ONE" LLP " blabla'

我不希望结果字符串中 ONE 之后的 "

一如既往,非常欢迎任何关于我遗漏的提示或解释。

谢谢!

您可以尝试使用 re.sub:

inp = ['blabla "RANDOM COMPANY ONE "LLP blabla', 'blabla "RANDOM COMPANY TWO " LLP blabla', 'blabla "RANDOM COMPANY THREE " LP blabla', 'blabla "RANDOM COMPANY FOUR "LP blabla']
output = [re.sub(r'"[ ]?(LP|LLP)', r'"', x) for x in inp]
print(output)

这会打印:

['blabla "RANDOM COMPANY ONE LLP" blabla',
 'blabla "RANDOM COMPANY TWO LLP" blabla',
 'blabla "RANDOM COMPANY THREE LP" blabla',
 'blabla "RANDOM COMPANY FOUR LP" blabla']

非常欢迎提示或解释我遗漏的内容。 您的 replace

领先 "
match = r'"\s*(LL?P)'
replace = r'"  "'

replace 更改为 r' "' 应该会有帮助。