Gensim: ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
Gensim: ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
我正在尝试为某些文档模拟流式传输,并在流式输入的其他文档上更新 LSI。我发现这个错误:
Traceback (most recent call last):
File "gensimStreamGen_tutorial5.py", line 57, in <module>
for vector in corpus_memory_friendly: # load one vector into memory at a time
File "gensimStreamGen_tutorial5.py", line 44, in __iter__
lsi = models.LsiModel(corpus, num_topics=10) # initialize an LSI transformation
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 331, in __init__
self.add_documents(corpus)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 388, in add_documents
update = Projection(self.num_terms, self.num_topics, job, extra_dims=self.extra_samples, power_iters=self.power_iters)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 126, in __init__
extra_dims=self.extra_dims)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 677, in stochastic_svd
q, _ = matutils.qr_destroy(y) # orthonormalize the range
File "/Users/Desktop/gensim-0.12.0/gensim/matutils.py", line 398, in qr_destroy
qr, tau, work, info = geqrf(a, lwork=-1, overwrite_a=True)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
流式文档和更新LSI模型的代码:
class MyCorpus(object):
def __iter__(self):
for document in documents:
# Stream-in documents and build TF-IDF model to construct new_vec
yield new_vec
corpus.append(new_vec)
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lsi = models.LsiModel(corpus_tfidf, num_topics=2)
corpus_lsi = lsi[corpus_tfidf]
lsi.print_topics(2)
for doc in corpus_lsi:
print(doc)
corpus_memory_friendly = MyCorpus()
for vector in corpus_memory_friendly:
print(vector)
语料库每次迭代都会得到一个新的 new_vec。 new_vec 不同迭代的每次收益率:
[]
[(0, 1)]
[(1, 1), (2, 1), (3, 1)]
[(3, 2), (4, 1), (5, 1)]
[(2, 1), (6, 1), (7, 1)]
[]
[(8, 1)]
[(8, 1), (9, 1)]
[(9, 1), (10, 1), (11, 1)]
第一次迭代时出现错误(预期 new_vec 中的第一行)。其余的是 new_vec.
的预期输出
我想是因为你文档中的数据是空白的
尝试添加
if(document!=[]and document!=[[]])
我正在尝试为某些文档模拟流式传输,并在流式输入的其他文档上更新 LSI。我发现这个错误:
Traceback (most recent call last):
File "gensimStreamGen_tutorial5.py", line 57, in <module>
for vector in corpus_memory_friendly: # load one vector into memory at a time
File "gensimStreamGen_tutorial5.py", line 44, in __iter__
lsi = models.LsiModel(corpus, num_topics=10) # initialize an LSI transformation
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 331, in __init__
self.add_documents(corpus)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 388, in add_documents
update = Projection(self.num_terms, self.num_topics, job, extra_dims=self.extra_samples, power_iters=self.power_iters)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 126, in __init__
extra_dims=self.extra_dims)
File "/Users/Desktop/gensim-0.12.0/gensim/models/lsimodel.py", line 677, in stochastic_svd
q, _ = matutils.qr_destroy(y) # orthonormalize the range
File "/Users/Desktop/gensim-0.12.0/gensim/matutils.py", line 398, in qr_destroy
qr, tau, work, info = geqrf(a, lwork=-1, overwrite_a=True)
ValueError: failed to create intent(cache|hide)|optional array-- must have defined dimensions but got (0,)
流式文档和更新LSI模型的代码:
class MyCorpus(object):
def __iter__(self):
for document in documents:
# Stream-in documents and build TF-IDF model to construct new_vec
yield new_vec
corpus.append(new_vec)
tfidf = models.TfidfModel(corpus)
corpus_tfidf = tfidf[corpus]
lsi = models.LsiModel(corpus_tfidf, num_topics=2)
corpus_lsi = lsi[corpus_tfidf]
lsi.print_topics(2)
for doc in corpus_lsi:
print(doc)
corpus_memory_friendly = MyCorpus()
for vector in corpus_memory_friendly:
print(vector)
语料库每次迭代都会得到一个新的 new_vec。 new_vec 不同迭代的每次收益率:
[]
[(0, 1)]
[(1, 1), (2, 1), (3, 1)]
[(3, 2), (4, 1), (5, 1)]
[(2, 1), (6, 1), (7, 1)]
[]
[(8, 1)]
[(8, 1), (9, 1)]
[(9, 1), (10, 1), (11, 1)]
第一次迭代时出现错误(预期 new_vec 中的第一行)。其余的是 new_vec.
的预期输出我想是因为你文档中的数据是空白的 尝试添加
if(document!=[]and document!=[[]])