gensim/ 训练 LDA 模型:'int' 对象不可订阅
gensim/ Training a LDA Model: 'int' object is not subscriptable
我创建了一个新词列表,其中删除了 'text8' 中的停用词,以训练 LDA 模型。然而,我收到TypeError: 'int' object is not subscriptable
,从语料库中猜测问题,并找不到解决方案。
这是我的代码:
import gensim.downloader as api
corpus=api.load('text8')
dictionary=gensim.corpora.Dictionary(corpus) # generate a dictionary from the text corpus
# removing stop words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
nltk.download('stopwords')
nltk.download('punkt')
stop_words = set(stopwords.words('english'))
word_tokens = dictionary
filtered_sentence = []
for w in word_tokens:
if word_tokens[w] not in stop_words:
filtered_sentence.append(word_tokens[w])
#print(filtered_sentence)
# generate a new dictionary from "filtered_sentence"
dct=gensim.corpora.Dictionary([filtered_sentence])
corpus2=dct.doc2bow(filtered_sentence)
以下行不起作用-- TypeError: 'int' object is not subscriptable
model=gensim.models.ldamodel.LdaModel(corpus2, num_topics=5, id2word=dct) #TypeError
model.print_topics(num_words=5)
详细错误信息:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-64-75e1fe1a727b> in <module>()
----> 1 model=gensim.models.ldamodel.LdaModel(corpus2, num_topics=5, id2word=dct) #TypeError: 'int' object is not subscriptable
2 model.print_topics(num_words=5)
3 frames
/usr/local/lib/python3.7/dist-packages/gensim/models/ldamodel.py in inference(self, chunk, collect_sstats)
651 # to Blei's original LDA-C code, cool!).
652 for d, doc in enumerate(chunk):
--> 653 if len(doc) > 0 and not isinstance(doc[0][0], six.integer_types + (np.integer,)):
654 # make sure the term IDs are ints, otherwise np will get upset
655 ids = [int(idx) for idx, _ in doc]
TypeError: 'int' object is not subscriptable
非常感谢您的帮助。非常感谢!
该错误可能与 filtered_sentence
被用作 corpus2
有关。要使代码正常工作,corpus2
必须是元组列表的列表。所以,这个技巧应该有所帮助:
corpus2 = [dct.doc2bow(filtered_sentence),]
我创建了一个新词列表,其中删除了 'text8' 中的停用词,以训练 LDA 模型。然而,我收到TypeError: 'int' object is not subscriptable
,从语料库中猜测问题,并找不到解决方案。
这是我的代码:
import gensim.downloader as api
corpus=api.load('text8')
dictionary=gensim.corpora.Dictionary(corpus) # generate a dictionary from the text corpus
# removing stop words
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import nltk
nltk.download('stopwords')
nltk.download('punkt')
stop_words = set(stopwords.words('english'))
word_tokens = dictionary
filtered_sentence = []
for w in word_tokens:
if word_tokens[w] not in stop_words:
filtered_sentence.append(word_tokens[w])
#print(filtered_sentence)
# generate a new dictionary from "filtered_sentence"
dct=gensim.corpora.Dictionary([filtered_sentence])
corpus2=dct.doc2bow(filtered_sentence)
以下行不起作用-- TypeError: 'int' object is not subscriptable
model=gensim.models.ldamodel.LdaModel(corpus2, num_topics=5, id2word=dct) #TypeError
model.print_topics(num_words=5)
详细错误信息:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-64-75e1fe1a727b> in <module>()
----> 1 model=gensim.models.ldamodel.LdaModel(corpus2, num_topics=5, id2word=dct) #TypeError: 'int' object is not subscriptable
2 model.print_topics(num_words=5)
3 frames
/usr/local/lib/python3.7/dist-packages/gensim/models/ldamodel.py in inference(self, chunk, collect_sstats)
651 # to Blei's original LDA-C code, cool!).
652 for d, doc in enumerate(chunk):
--> 653 if len(doc) > 0 and not isinstance(doc[0][0], six.integer_types + (np.integer,)):
654 # make sure the term IDs are ints, otherwise np will get upset
655 ids = [int(idx) for idx, _ in doc]
TypeError: 'int' object is not subscriptable
非常感谢您的帮助。非常感谢!
该错误可能与 filtered_sentence
被用作 corpus2
有关。要使代码正常工作,corpus2
必须是元组列表的列表。所以,这个技巧应该有所帮助:
corpus2 = [dct.doc2bow(filtered_sentence),]