AttributeError: 'list' object has no attribute 'words' in python gensim module

Question

在使用 doc2vec 进行训练时，出现此错误：

AttributeError: 'list' object has no attribute 'words' in python gensim module

这是我的代码：

# Extracting titles from csv to list
with open(query+'_titles.csv', 'rb') as f:
    reader = csv.reader(f)
    titlelist = list(reader)
# build
model = doc2vec.Doc2Vec(size=30, window=1, alpha=0.01, min_count=2, sample=1e-5, workers=100)
model.build_vocab(titlelist)
titlearray = np.asarray(titlelist)
print 'Training Model...'

我使用 python 2.7.11 并且 gensim 版本是 3.2.0 如果有帮助的话。必须有一些我真的很想念的东西。

Answer 1

Doc2Vec 不仅需要句子列表，还需要 标记的 句子列表。来自 this discussion on DS.SE:

In word2vec there is no need to label the words, because every word has their own semantic meaning in the vocabulary. But in case of doc2vec, there is a need to specify that how many number of words or sentences convey a semantic meaning, so that the algorithm could identify it as a single entity. For this reason, we are specifying labels or tags to sentence or paragraph depending on the level of semantic meaning conveyed.

因此，Gensim 需要以下输入：

sentences = [doc2vec.TaggedDocument(sentence, 'tag') for sentence in titlelist]
model.build_vocab(sentences)

显然，您可能希望根据句子设置不同的标签以获得有意义的向量。顺便问一下，您确定要以二进制模式读取 CSV 文件吗？

AttributeError: 'list' object has no attribute 'words' in python gensim module

AttributeError: 'list' object has no attribute 'words' in python gensim module

python

nlp

machine-learning

gensim

doc2vec