Poincare 嵌入:从 WordNet 构建传递闭包

Poincare embeddings: building transitive closures from WordNet

我想在 Poincaré Embeddings for Learning Hierarchical Representations 中复制图 2,即:来自 WordNet 的“哺乳动物”子树的 Poincare 嵌入。

首先,我构建了表示图所需的传递闭包。在 these docs and this SO answer 之后,我执行以下操作来构建关系:

from   nltk.corpus import wordnet as wn

root    = wn.synset('mammal.n.01')
words   = list(set([w for s in root.closure(hyponyms) for w in s.lemma_names()]))
rname   = root.name().split('.')[0]
closure = [(word, rname) for word in words]

然后我使用 Gensim's Poincare model 来计算嵌入。给定 Gensim 文档中的示例关系,例如

relations = [('kangaroo', 'marsupial'), ('kangaroo', 'mammal'), ('gib', 'cat')]

我推断上位词需要在右边。这是模型拟合代码:


from   gensim.models.poincare import PoincareModel
from   gensim.viz.poincare import poincare_2d_visualization

model = PoincareModel(relations, size=2, negative=0)
model.train(epochs=50)

fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings')
fig.show()

但是,结果显然是不正确的,因为它看起来不像论文。我做错了什么?

我认为这里的主要问题源于这一行:

closure = [(word, rname) for word in words]

您正在生成一个列表,其中每个词仅与 rname 相关,即“哺乳动物”。也就是说,您只得到 ("columbian_mammoth", "mammal") 并且缺少中间步骤 ("columbian_mammoth", "mammoth"), ("mammoth", "elephant"), ("elephant", "proboscidean") 等等。

我建议使用递归函数 append_pairs 来解决这个问题。我还稍微微调了 PoincareModelpoincare_2d_visualization 的参数。

from nltk.corpus import wordnet as wn
from gensim.models.poincare import PoincareModel
from gensim.viz.poincare import poincare_2d_visualization


def simple_name(r):
    return r.name().split('.')[0]


def append_pairs(my_root, pairs):
    for w in my_root.hyponyms():
        pairs.append((simple_name(w), simple_name(my_root)))
        append_pairs(w, pairs)
    return pairs


if __name__ == '__main__':
    root = wn.synset('mammal.n.01')
    words = list(set([w for s in root.closure(lambda s: s.hyponyms()) for w in s.lemma_names()]))

    relations = append_pairs(root, [])

    model = PoincareModel(relations, size=2, negative=10)
    model.train(epochs=20)

    fig = poincare_2d_visualization(model, relations, 'WordNet Poincare embeddings', num_nodes=None)
    fig.show()

图像还没有原始来源中的那么漂亮,但至少你现在可以看到聚类了。