Gensim 中 doc2vec 标签的问题
Issues in doc2vec tags in Gensim
我正在使用如下的 gensim doc2vec。
from gensim.models import doc2vec
from collections import namedtuple
import re
my_d = {'recipe__001__1': 'recipe 1 details should come here',
'recipe__001__2': 'Ingredients of recipe 2 need to be added'}
docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')
for key, value in my_d.items():
value = re.sub("[^a-zA-Z]"," ", value)
words = value.lower().split()
tags = key
docs.append(analyzedDocument(words, tags))
model = doc2vec.Doc2Vec(docs, size = 300, window = 10, dm=1, negative=5, hs=0, min_count = 1, workers = 4, iter = 20)
但是,当我检查 model.docvecs.offset2doctag
时,我得到 ['r', 'e', 'c', 'i', 'p', '_', '0', '1', '2']
作为输出。真正的输出应该是`'recipe__001__1' and 'recipe__001__2'.
当我使用 len(model.docvecs.doctag_syn0)
时,我得到 9
作为输出。但实际值应该是 2
因为我的测试字典里只有 2 个菜谱。
请告诉我,为什么会这样?
尝试更改此行:
tags = key
至
tags = [key]
我正在使用如下的 gensim doc2vec。
from gensim.models import doc2vec
from collections import namedtuple
import re
my_d = {'recipe__001__1': 'recipe 1 details should come here',
'recipe__001__2': 'Ingredients of recipe 2 need to be added'}
docs = []
analyzedDocument = namedtuple('AnalyzedDocument', 'words tags')
for key, value in my_d.items():
value = re.sub("[^a-zA-Z]"," ", value)
words = value.lower().split()
tags = key
docs.append(analyzedDocument(words, tags))
model = doc2vec.Doc2Vec(docs, size = 300, window = 10, dm=1, negative=5, hs=0, min_count = 1, workers = 4, iter = 20)
但是,当我检查 model.docvecs.offset2doctag
时,我得到 ['r', 'e', 'c', 'i', 'p', '_', '0', '1', '2']
作为输出。真正的输出应该是`'recipe__001__1' and 'recipe__001__2'.
当我使用 len(model.docvecs.doctag_syn0)
时,我得到 9
作为输出。但实际值应该是 2
因为我的测试字典里只有 2 个菜谱。
请告诉我,为什么会这样?
尝试更改此行:
tags = key
至
tags = [key]