在 nltk 中实现停用词时遇到问题
Trouble implementing stopwords in nltk
从语料库导入停用词后,我从 nltk.download() 下载了所有文件,然后
#reading from a .txt file
list = []
with open("positive.txt", "r") as file:
for words in file:
words = words.strip()
list.append(words)
#tokenizing words
pos_words = []
for i in list:
pos_words.append(word_tokenize(i))
stop_words = [stopwords.words('english')]
print(stop_words)
final_pos_words = []
for i in pos_words:
if i not in stop_words:
final_pos_words.append(i)
print(final_pos_words)
但这并没有进行任何删除
在 运行 这个之后:
final_pos_words = []
for i in pos_words:
if i in stop_words:
final_pos_words.append(i)
print(final_pos_words)
输出为[]
可能更改为:
# import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
#Read the file
f = open('positive.txt').read()
#Tokenizing the words
words = word_tokenize(f)
#set of predifined english stop words
stop_words = set(stopwords.words('english'))
#Filter stop words
filtered = [w for w in words if not w in stop_words]
print(filtered)
我已经试过了,对我来说没有错误,试试吧,让我知道结果。
从语料库导入停用词后,我从 nltk.download() 下载了所有文件,然后
#reading from a .txt file
list = []
with open("positive.txt", "r") as file:
for words in file:
words = words.strip()
list.append(words)
#tokenizing words
pos_words = []
for i in list:
pos_words.append(word_tokenize(i))
stop_words = [stopwords.words('english')]
print(stop_words)
final_pos_words = []
for i in pos_words:
if i not in stop_words:
final_pos_words.append(i)
print(final_pos_words)
但这并没有进行任何删除 在 运行 这个之后:
final_pos_words = []
for i in pos_words:
if i in stop_words:
final_pos_words.append(i)
print(final_pos_words)
输出为[]
可能更改为:
# import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
#Read the file
f = open('positive.txt').read()
#Tokenizing the words
words = word_tokenize(f)
#set of predifined english stop words
stop_words = set(stopwords.words('english'))
#Filter stop words
filtered = [w for w in words if not w in stop_words]
print(filtered)
我已经试过了,对我来说没有错误,试试吧,让我知道结果。