FastText most_similar 没有 return 完全匹配
FastText most_similar doesn't return complete match
我知道我的词汇表中有一个词 "cat" 和 "cats"。
示例 1:
model.wv.most_similar("cat")
这 returns [ ("cats", 0.83...), ("wild", 0.79...), ... ]。结果顶部没有("cat", 1.0).
示例 2:
model.wv.most_similar("cats")
这 returns [ ("cat", 0.85...), ("wild", 0.77...), ... ]。结果顶部没有("cats", 1.0).
问题:有没有办法在结果顶部获得完全匹配?或者用其他方法检查完全匹配......也许我不明白。无论如何,需要帮助。
请使用 AnnoyIndexer,您也将拥有最相似的元素
from gensim.models import Word2Vec, KeyedVectors
from gensim.models.word2vec import Text8Corpus
params = {
'alpha': 0.05,
'size': 100,
'window': 5,
'iter': 5,
'min_count': 5,
'sample': 1e-4,
'sg': 1,
'hs': 0,
'negative': 5
}
model = Word2Vec(Text8Corpus(text8_path), **params)
print(model)
from gensim.similarities.index import AnnoyIndexer
annoy_index = AnnoyIndexer(model, 100)
vector = model.wv["cats"]
approximate_neighbors = model.wv.most_similar([vector], topn=11,
indexer=annoy_index)
print("Approximate Neighbors")
for neighbor in approximate_neighbors:
print(neighbor)
输出片段:
Approximate Neighbors
('cats', 1.0)
('wallabies', 0.6341749131679535)
('coyotes', 0.6311245858669281)
('kangaroos', 0.6296325325965881)
('felines', 0.6287126243114471)
('squirrels', 0.6270308494567871)
('dogs', 0.6266725659370422)
('leopards', 0.6130028069019318)
('omnivores', 0.6129975318908691)
('koalas', 0.612080842256546)
('microbats', 0.6070675551891327)
如果你被困在某个地方,请让我来..:)
我知道我的词汇表中有一个词 "cat" 和 "cats"。
示例 1:
model.wv.most_similar("cat")
这 returns [ ("cats", 0.83...), ("wild", 0.79...), ... ]。结果顶部没有("cat", 1.0).
示例 2:
model.wv.most_similar("cats")
这 returns [ ("cat", 0.85...), ("wild", 0.77...), ... ]。结果顶部没有("cats", 1.0).
问题:有没有办法在结果顶部获得完全匹配?或者用其他方法检查完全匹配......也许我不明白。无论如何,需要帮助。
请使用 AnnoyIndexer,您也将拥有最相似的元素
from gensim.models import Word2Vec, KeyedVectors
from gensim.models.word2vec import Text8Corpus
params = {
'alpha': 0.05,
'size': 100,
'window': 5,
'iter': 5,
'min_count': 5,
'sample': 1e-4,
'sg': 1,
'hs': 0,
'negative': 5
}
model = Word2Vec(Text8Corpus(text8_path), **params)
print(model)
from gensim.similarities.index import AnnoyIndexer
annoy_index = AnnoyIndexer(model, 100)
vector = model.wv["cats"]
approximate_neighbors = model.wv.most_similar([vector], topn=11,
indexer=annoy_index)
print("Approximate Neighbors")
for neighbor in approximate_neighbors:
print(neighbor)
输出片段:
Approximate Neighbors
('cats', 1.0)
('wallabies', 0.6341749131679535)
('coyotes', 0.6311245858669281)
('kangaroos', 0.6296325325965881)
('felines', 0.6287126243114471)
('squirrels', 0.6270308494567871)
('dogs', 0.6266725659370422)
('leopards', 0.6130028069019318)
('omnivores', 0.6129975318908691)
('koalas', 0.612080842256546)
('microbats', 0.6070675551891327)
如果你被困在某个地方,请让我来..:)