为什么 2 个文档的相似度为 returns 0.00?
Why 2 docs with a single word returns 0.00 for similarity?
为什么下面的代码给出了 0.00 的相似度,因为两个文档都使用相同的单词 astronaut?
import spacy
nlp = spacy.en.English()
print (nlp('astronaut').similarity(nlp('astronaut')))
# Result: 0.0
原因是单项的词向量是点,点与点之间无法得到cosine distance similarity
查看多维向量与点的向量比较:
>>> a = nlp(u'astronaut eating apple banana cherry')
>>> b = nlp(u'astronaut eating apple banana fruit')
>>> a.similarity(b)
0.96363932891327542
>>> a.similarity(a)
0.99999997666693974
>>> b.similarity(b)
1.000000996690289
>>> a = nlp(u'astronaut')
>>> b = nlp(u'astronaut')
>>> a.similarity(a)
0.0
>>> b = nlp(u'cosmonaut')
>>> a.similarity(a)
0.0
>>> b.similarity(b)
0.0
>>> a.similarity(b)
0.0
>>> c = nlp(u'single')
>>> a.similarity(c)
0.0
为什么下面的代码给出了 0.00 的相似度,因为两个文档都使用相同的单词 astronaut?
import spacy
nlp = spacy.en.English()
print (nlp('astronaut').similarity(nlp('astronaut')))
# Result: 0.0
原因是单项的词向量是点,点与点之间无法得到cosine distance similarity
查看多维向量与点的向量比较:
>>> a = nlp(u'astronaut eating apple banana cherry')
>>> b = nlp(u'astronaut eating apple banana fruit')
>>> a.similarity(b)
0.96363932891327542
>>> a.similarity(a)
0.99999997666693974
>>> b.similarity(b)
1.000000996690289
>>> a = nlp(u'astronaut')
>>> b = nlp(u'astronaut')
>>> a.similarity(a)
0.0
>>> b = nlp(u'cosmonaut')
>>> a.similarity(a)
0.0
>>> b.similarity(b)
0.0
>>> a.similarity(b)
0.0
>>> c = nlp(u'single')
>>> a.similarity(c)
0.0