python nltk -- sentences/phrases 的词干列表
python nltk -- stemming list of sentences/phrases
我的列表中有一堆句子,我想使用 nltk 库来阻止它。我可以一次提取一个句子,但是我在从列表中提取句子并将它们重新组合在一起时遇到了问题。我缺少一个步骤吗? nltk 库很新。谢谢!
import nltk
from nltk.stem import PorterStemmer
ps = PorterStemmer()
# Success: one sentences at a time
data = 'the gamers playing games'
words = word_tokenize(data)
for w in words:
print(ps.stem(w))
# Fails:
data_list = ['the gamers playing games',
'higher scores',
'sports']
words = word_tokenize(data_list)
for w in words:
print(ps.stem(w))
# Error: TypeError: expected string or bytes-like object
# result should be:
['the gamer play game',
'higher score',
'sport']
您正在将列表传递给 word_tokenize
,但您不能。
解决方案是将您的逻辑包装在另一个 for-loop
,
data_list = ['the gamers playing games','higher scores','sports']
for words in data_list:
words = tokenize.word_tokenize(words)
for w in words:
print(ps.stem(w))
>>>>the
gamer
play
game
higher
score
sport
import nltk
from nltk.tokenize import sent_tokenize
from nltk.stem import PorterStemmer
sentence = """At eight o'clock on Thursday morning, Arthur didn't feel very good. So i take him to hospital."""
sentence = sentence.lower()
word_tokens = nltk.word_tokenize(sentence)
sent_tokens = sent_tokenize(sentence)
stemmer = PorterStemmer()
stemmed_word = []
stemmed_sent = []
for token in word_tokens:
stemmed_word.append(stemmer.stem(token))
for sent_token in sent_tokens:
stemmed_sent.append(stemmer.stem(sent_token))
print(stemmed_word)
print(stemmed_sent)
为了阻止并重新编译回列表数据结构,我会选择:
ps = PorterStemmer()
data_list_s = []
for words in data_list:
words = word_tokenize(words)
words_s = ''
for w in words:
w_s = ps.stem(w)
words_s+=w_s+' '
data_list_s.append(words_s)
这会将 data_list
中每个元素的词干结果放入一个名为 data_list_s
的新列表中。
我的列表中有一堆句子,我想使用 nltk 库来阻止它。我可以一次提取一个句子,但是我在从列表中提取句子并将它们重新组合在一起时遇到了问题。我缺少一个步骤吗? nltk 库很新。谢谢!
import nltk
from nltk.stem import PorterStemmer
ps = PorterStemmer()
# Success: one sentences at a time
data = 'the gamers playing games'
words = word_tokenize(data)
for w in words:
print(ps.stem(w))
# Fails:
data_list = ['the gamers playing games',
'higher scores',
'sports']
words = word_tokenize(data_list)
for w in words:
print(ps.stem(w))
# Error: TypeError: expected string or bytes-like object
# result should be:
['the gamer play game',
'higher score',
'sport']
您正在将列表传递给 word_tokenize
,但您不能。
解决方案是将您的逻辑包装在另一个 for-loop
,
data_list = ['the gamers playing games','higher scores','sports']
for words in data_list:
words = tokenize.word_tokenize(words)
for w in words:
print(ps.stem(w))
>>>>the
gamer
play
game
higher
score
sport
import nltk
from nltk.tokenize import sent_tokenize
from nltk.stem import PorterStemmer
sentence = """At eight o'clock on Thursday morning, Arthur didn't feel very good. So i take him to hospital."""
sentence = sentence.lower()
word_tokens = nltk.word_tokenize(sentence)
sent_tokens = sent_tokenize(sentence)
stemmer = PorterStemmer()
stemmed_word = []
stemmed_sent = []
for token in word_tokens:
stemmed_word.append(stemmer.stem(token))
for sent_token in sent_tokens:
stemmed_sent.append(stemmer.stem(sent_token))
print(stemmed_word)
print(stemmed_sent)
为了阻止并重新编译回列表数据结构,我会选择:
ps = PorterStemmer()
data_list_s = []
for words in data_list:
words = word_tokenize(words)
words_s = ''
for w in words:
w_s = ps.stem(w)
words_s+=w_s+' '
data_list_s.append(words_s)
这会将 data_list
中每个元素的词干结果放入一个名为 data_list_s
的新列表中。