如何对 python 中的单词列表使用词干算法

Question

我有一个单词表：

'AWS', 
'jQuery', 
'jQuery', 
'Sliding', 
'jQuery', 
'jQuery', 
'Manipulating', 
'Us!'

我删除了常用词，需要应用词干提取以使词表更清晰。

我的删除常用词的代码：

raw2 = second_headers CORPUS = Common_word_corpus  #my personal word corpus added here

corpus = [w.lower() for w in CORPUS]  
processed_H2_tag = [w for w in raw2.split(' ') if w.lower() not in corpus] 

print(processed_H2_tag)

Answer 1

这个怎么样？

# download wordnet
import nltk
nltk.download('wordnet')

# import these modules
from nltk.stem import WordNetLemmatizer
from nltk.corpus import wordnet 
nltk.download('wordnet')

lemmatizer = WordNetLemmatizer()

# choose some words to be stemmed
words = ['AWS', 
'jQuery', 
'jQuery', 
'Sliding', 
'jQuery', 
'jQuery', 
'Manipulating', 
'Manipulateing', 
'Manipulate', 
'Us!']
 
for w in words:
    print(w, " : ", lemmatizer.lemmatize(w.lower(), pos=wordnet.VERB))

输出：

AWS  :  aws
jQuery  :  jquery
jQuery  :  jquery
Sliding  :  slide
jQuery  :  jquery
jQuery  :  jquery
Manipulating  :  manipulate
Manipulateing  :  manipulate
Manipulate  :  manipulate
Us!  :  us!

如何对 python 中的单词列表使用词干算法

How to use Stemming algorithm for a list of words in python

python

nlp