使用正则表达式无法按预期工作的单词删除数字
removing digits with words using regular expressions not working as expected
import re
text = """Why is this $[...] when the same product is available for $[...] here?<br />
http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY<br /><br />
The Victor M380 and M502 traps are unreal, of course -- total fly genocide.
Pretty stinky, but only right nearby. won't, can't iamwordwith4number 234f ther was a word withnumber before me"""
sentense1 = re.sub(r"\S*\d+\S*", "", text) # removes words which has digits in it.
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text) # removes punctuations.
print(sentense1)
我正在尝试删除其中包含数字的单词。例如在上面的句子中,我们有这样的词:iamwordwith4number 或 234f。
所以我想删除它们。如果我评论第二个正则表达式行,它就会工作。我不确定是否存在依赖性。你能给我一些建议吗?
你的第二个正则表达式应该是这样的:
sentense1 = re.sub('[^A-Za-z0-9]+', " ", sentense1) # removes punctuations.
而不是这个:
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text) # removes punctuations.
import re
text = """Why is this $[...] when the same product is available for $[...] here?<br />
http://www.amazon.com/VICTOR-FLY-MAGNET-BAIT-REFILL/dp/B00004RBDY<br /><br />
The Victor M380 and M502 traps are unreal, of course -- total fly genocide.
Pretty stinky, but only right nearby. won't, can't iamwordwith4number 234f ther was a word withnumber before me"""
sentense1 = re.sub(r"\S*\d+\S*", "", text) # removes words which has digits in it.
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text) # removes punctuations.
print(sentense1)
我正在尝试删除其中包含数字的单词。例如在上面的句子中,我们有这样的词:iamwordwith4number 或 234f。 所以我想删除它们。如果我评论第二个正则表达式行,它就会工作。我不确定是否存在依赖性。你能给我一些建议吗?
你的第二个正则表达式应该是这样的:
sentense1 = re.sub('[^A-Za-z0-9]+', " ", sentense1) # removes punctuations.
而不是这个:
sentense1 = re.sub('[^A-Za-z0-9]+', " ", text) # removes punctuations.