正则表达式将单词带入带引号的子字符串
regex to bring a word inside a quoted substring
我正在开发一个函数,该函数通过识别 LP
或 LLP
是否出现,前面是否出现 space,在 "
之后的任何位置在字符串中。如果是这种情况,我想将 LP
或 LLP
子字符串放在带引号的子字符串中,如下所示。
# input
'blabla "RANDOM COMPANY ONE "LLP blabla'
'blabla "RANDOM COMPANY TWO " LLP blabla'
'blabla "RANDOM COMPANY THREE " LP blabla'
'blabla "RANDOM COMPANY FOUR "LP blabla'
# output
'blabla "RANDOM COMPANY ONE LLP" blabla'
'blabla "RANDOM COMPANY TWO LLP" blabla'
'blabla "RANDOM COMPANY THREE LP" blabla'
'blabla "RANDOM COMPANY FOUR LP" blabla'
到目前为止,我得到了这个功能,它几乎可以满足我的要求:
def fix_entity_broken_by_quotes(text):
match = r'"\s*(LL?P)'
replace = r'" "'
return ' '.join(re.sub(match, replace, text).split())
# run
>>> fix_entity_broken_by_quotes('blabla "RANDOM COMPANY ONE" LLP blabla')
Out[1]: 'blabla "RANDOM COMPANY ONE" LLP " blabla'
我不希望结果字符串中 ONE
之后的 "
。
一如既往,非常欢迎任何关于我遗漏的提示或解释。
谢谢!
您可以尝试使用 re.sub
:
inp = ['blabla "RANDOM COMPANY ONE "LLP blabla', 'blabla "RANDOM COMPANY TWO " LLP blabla', 'blabla "RANDOM COMPANY THREE " LP blabla', 'blabla "RANDOM COMPANY FOUR "LP blabla']
output = [re.sub(r'"[ ]?(LP|LLP)', r'"', x) for x in inp]
print(output)
这会打印:
['blabla "RANDOM COMPANY ONE LLP" blabla',
'blabla "RANDOM COMPANY TWO LLP" blabla',
'blabla "RANDOM COMPANY THREE LP" blabla',
'blabla "RANDOM COMPANY FOUR LP" blabla']
非常欢迎提示或解释我遗漏的内容。
您的 replace
领先 "
match = r'"\s*(LL?P)'
replace = r'" "'
将 replace
更改为 r' "'
应该会有帮助。
我正在开发一个函数,该函数通过识别 LP
或 LLP
是否出现,前面是否出现 space,在 "
之后的任何位置在字符串中。如果是这种情况,我想将 LP
或 LLP
子字符串放在带引号的子字符串中,如下所示。
# input
'blabla "RANDOM COMPANY ONE "LLP blabla'
'blabla "RANDOM COMPANY TWO " LLP blabla'
'blabla "RANDOM COMPANY THREE " LP blabla'
'blabla "RANDOM COMPANY FOUR "LP blabla'
# output
'blabla "RANDOM COMPANY ONE LLP" blabla'
'blabla "RANDOM COMPANY TWO LLP" blabla'
'blabla "RANDOM COMPANY THREE LP" blabla'
'blabla "RANDOM COMPANY FOUR LP" blabla'
到目前为止,我得到了这个功能,它几乎可以满足我的要求:
def fix_entity_broken_by_quotes(text):
match = r'"\s*(LL?P)'
replace = r'" "'
return ' '.join(re.sub(match, replace, text).split())
# run
>>> fix_entity_broken_by_quotes('blabla "RANDOM COMPANY ONE" LLP blabla')
Out[1]: 'blabla "RANDOM COMPANY ONE" LLP " blabla'
我不希望结果字符串中 ONE
之后的 "
。
一如既往,非常欢迎任何关于我遗漏的提示或解释。
谢谢!
您可以尝试使用 re.sub
:
inp = ['blabla "RANDOM COMPANY ONE "LLP blabla', 'blabla "RANDOM COMPANY TWO " LLP blabla', 'blabla "RANDOM COMPANY THREE " LP blabla', 'blabla "RANDOM COMPANY FOUR "LP blabla']
output = [re.sub(r'"[ ]?(LP|LLP)', r'"', x) for x in inp]
print(output)
这会打印:
['blabla "RANDOM COMPANY ONE LLP" blabla',
'blabla "RANDOM COMPANY TWO LLP" blabla',
'blabla "RANDOM COMPANY THREE LP" blabla',
'blabla "RANDOM COMPANY FOUR LP" blabla']
非常欢迎提示或解释我遗漏的内容。
您的 replace
"
match = r'"\s*(LL?P)'
replace = r'" "'
将 replace
更改为 r' "'
应该会有帮助。