hits@k 是如何计算的，它在知识库中 link 预测的上下文中意味着什么

Question

我研究了关于知识网络中 link 预测的论文。作者通常报告 "Hits@k"。我想知道如何计算 hits@k 以及它对模型和结果意味着什么？

Answer 1

简而言之，就是有多少个正三元组相对于一堆合成负数排在前 n 个位置。

在下面的示例中，假设测试集仅包含两个基本事实：

Jack   born_in   Italy
Jack   friend_with   Thomas

让我们假设这样的正三元组（在下面用 * 标识）分别与四个合成负元组进行排名。

现在，使用预训练的嵌入模型为每个正例及其合成负例分配一个分数。然后，按降序对三元组进行排序。在下面的示例中，第一个三元组排名第二，另一个三元组排名第一（针对它们各自的合成底片）：

s        p         o            score   rank
Jack   born_in   Ireland        0.789      1
Jack   born_in   Italy          0.753      2  *
Jack   born_in   Germany        0.695      3
Jack   born_in   China          0.456      4
Jack   born_in   Thomas         0.234      5

s        p         o            score   rank
Jack   friend_with   Thomas     0.901      1  *
Jack   friend_with   China      0.345      2
Jack   friend_with   Italy      0.293      3
Jack   friend_with   Ireland    0.201      4
Jack   friend_with   Germany    0.156      5

然后，统计在 top-1 或 top-3 位置出现了多少个正例，并除以测试集中的三元组数（在本例中包括 2 个三元组）：

Hits@3= 2/2 = 1.0
Hits@1= 1/2 = 0.5

AmpliGraph has an API to compute Hits@n - check out the documentation here.

hits@k 是如何计算的，它在知识库中 link 预测的上下文中意味着什么

How is hits@k calculated and what does it mean in the context of link prediction in knowledge bases

entity

operations-research

knowledge-graph

entity-linking