使用 python 从词形还原词集中删除符号的任何方法

Any way to remove symbols from a lemmatize word set using python

我从下面的代码中得到了一个词形还原输出,输出单词由“:, ?, !, ( )”符号组成

output_H3 = [lemmatizer.lemmatize(w.lower(), pos=wordnet.VERB) for w in processed_H3_tag]

输出:-

预期输出:-

正则表达式可以提供帮助:

import re 

output = [
    "hide()",
    "show()",
    "methods:",
    "jquery",
    "slide",
    "elements:",
    "launchedw3schools",
    "today!",
]


>>> import pprint
>>> expected = [re.sub(r'[:,?!()]', '', e) for e in output]
>>> pprint.pprint(expected)
['hide',
 'show',
 'methods',
 'jquery',
 'slide',
 'elements',
 'launchedw3schools',
 'today']

这会将您不需要的字符列表中的任何内容都替换掉。

您也可以使用 translate()string.punctuation (!"#$%&'()*+,-./:;<=>?@[\]^_``{|}~):

trans = str.maketrans('', '', string.punctuation)   
output_wo_punc = [s.translate(trans) for s in output]

哪个returns:

> ['hide', 'show', 'methods', 'jquery', 'slide', 'elements', 'launchedw3schools', 'today']