从 Python NLTK 中的文件定义自己语言特定的一组停用词

Question

有没有办法自定义这个

stopWords = set(stopwords.words('english'))

或任何其他方式，这样我就可以在 Python 的 NLTK 中使用带有我的语言停用词的文本文件？

如果我的文本文件是 my_stop_words.txt，我如何告诉 NLTK 使用这组单词而不是为 'english' 设置？

非常感谢！

Answer 1

是的，您可以阅读自己的停用词文件，尽管 NLTK 的停用词支持多种语言。

试试这样的：

with open("stopwords.txt", "r") as f:
    new_stopwords = []
    for line in f.readlines()
        new_stopwords.append(line)

new_stopwords_set = set(new_stopwords)

从 Python NLTK 中的文件定义自己语言特定的一组停用词

Define own language specific set of stop-words from file in Python NLTK

python

nlp

nltk

stop-words