spaCy中有没有打印出最相似句子的功能?
Is there a function to print out the most similar sentence in spaCy?
我有一个包含 10 部电影简介的 txt 文件。我有一个单独的绿巨人电影概要,存储为变量中的字符串。我需要将这 10 个概要与绿巨人的概要进行比较,以找到最相似的电影来推荐。我的代码如下:
import spacy
nlp = spacy.load('en_core_web_lg')
hulk_description = """Will he save their world or destroy it? When the Hulk becomes too dangerous for the
Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a
planet where the Hulk can live in peace. Unfortunately, Hulk land on the
planet Sakaar where he is sold into slavery and trained as a gladiator."""
hulk = nlp(hulk_description)
movies = []
with open('movies.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
tmp = line.split()
movies.append(line)
for token in movies:
token = nlp(token)
print(token.similarity(hulk))
所以这是有效的,它打印出以下内容:
0.9299734027118595
0.9045154830561336
0.9248706809139479
0.6760996697288897
0.8521583959686228
0.9340271750528514
0.9251483541429658
0.8806094116148976
0.8709798309015676
0.8489256857995392
我可以看到第 6 部电影的概要最相似,为 0.9340271750528514。但我的问题是; spaCy 中是否有一个函数可以让我在完成比较后只打印出最相似的句子?即我基本上想比较所有这些,然后通过显示其概要来推荐最相似的电影。
尝试使用这个:
max((nlp(token).similarity(hulk), token) for token in movies)
您可以尝试通过 nlp.pipe
生成器传递字符串和电影来计算其相似度:
import spacy
nlp = spacy.load('en_core_web_md')
hulk_description = """Will he save their world or destroy it? When the Hulk becomes too dangerous for the
Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a
planet where the Hulk can live in peace. Unfortunately, Hulk land on the
planet Sakaar where he is sold into slavery and trained as a gladiator."""
hulk = nlp(hulk_description)
movies = ["this is a movie", "this is another movie about Hulk"]
sims = []
for movie in nlp.pipe(movies):
sims.append(hulk.similarity(movie))
id_max = np.argmax(sims)
print(movies[id_max])
# this is another movie about Hulk
我有一个包含 10 部电影简介的 txt 文件。我有一个单独的绿巨人电影概要,存储为变量中的字符串。我需要将这 10 个概要与绿巨人的概要进行比较,以找到最相似的电影来推荐。我的代码如下:
import spacy
nlp = spacy.load('en_core_web_lg')
hulk_description = """Will he save their world or destroy it? When the Hulk becomes too dangerous for the
Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a
planet where the Hulk can live in peace. Unfortunately, Hulk land on the
planet Sakaar where he is sold into slavery and trained as a gladiator."""
hulk = nlp(hulk_description)
movies = []
with open('movies.txt', 'r') as f_in:
for line in map(str.strip, f_in):
if not line:
continue
tmp = line.split()
movies.append(line)
for token in movies:
token = nlp(token)
print(token.similarity(hulk))
所以这是有效的,它打印出以下内容:
0.9299734027118595
0.9045154830561336
0.9248706809139479
0.6760996697288897
0.8521583959686228
0.9340271750528514
0.9251483541429658
0.8806094116148976
0.8709798309015676
0.8489256857995392
我可以看到第 6 部电影的概要最相似,为 0.9340271750528514。但我的问题是; spaCy 中是否有一个函数可以让我在完成比较后只打印出最相似的句子?即我基本上想比较所有这些,然后通过显示其概要来推荐最相似的电影。
尝试使用这个:
max((nlp(token).similarity(hulk), token) for token in movies)
您可以尝试通过 nlp.pipe
生成器传递字符串和电影来计算其相似度:
import spacy
nlp = spacy.load('en_core_web_md')
hulk_description = """Will he save their world or destroy it? When the Hulk becomes too dangerous for the
Earth, the Illuminati trick Hulk into a shuttle and launch him into space to a
planet where the Hulk can live in peace. Unfortunately, Hulk land on the
planet Sakaar where he is sold into slavery and trained as a gladiator."""
hulk = nlp(hulk_description)
movies = ["this is a movie", "this is another movie about Hulk"]
sims = []
for movie in nlp.pipe(movies):
sims.append(hulk.similarity(movie))
id_max = np.argmax(sims)
print(movies[id_max])
# this is another movie about Hulk