如何从 python 中的字符串中删除子字符串“-”,但保留“-”子字符串?

How to remove substrings "- " from a string in python, but keeping " - " substring?

示例:

string = " a lot of text ... protective equip- ment ... a lot of text - with similar broken words like simple appli- cations ..."

我需要获取相同的文本,但是 设备 变成了 设备应用程序 成为 应用程序 。 谢谢

如果要删除 2 个单词之间的 '- ',可以使用以下正则表达式:

>>> import re
>>> string = " a lot of text ... protective equip- ment ... a lot of text - with similar broken words like simple appli- cations ..."
>>> re.sub(r"(\w+)- (\w+)", r"", string)
' a lot of text ... protective equipment ... a lot of text - with similar broken words like simple applications ...'

一个正则表达式需要一个连字符后跟 space,但如果它前面有一个 space,则拒绝它,将达到目的:

import re
string = "a lot of text ... protective equip- ment ... a lot of text - with similar broken words like simple appli- cations ..."
print(re.sub(r"(?<! )- ", "", string))

输出:

a lot of text ... protective equipment ... a lot of text - with similar broken words like simple applications ...