在 nltk python 中创建一组停用词

Question

我知道 NLTk 停用词有很多种语言，但如果我想创建自己的停用词集并想在 NLTK 停用词中使用它们，那可行吗？

import nltk
from nltk.corpus import stopwords
stops=set(stopwords.words('My own set'))
words=["Don't", 'hesitate','to','ask','questions']
print([word for word in words if word not in stops])

Answer 1

将以space作为分隔符的停用词集存储在文本文件中，例如stop.txt stop_words = open('stop.txt','r').read().split()

这将 return 包含停用词的列表。

Answer 2

另一种或可能是成本较低的方法是创建一个 FILENAME.py 文件，其中包含停用词列表。然后导入 FILENAME.py 并调用停用词列表。这将消除 I/O。

在 nltk python 中创建一组停用词

creating set of stopwords in nltk python

python

nlp

nltk

stop-words