在 StellarGraph returns NaN 中使用 Hinsage/Graphsage 进行链路预测

Linkprediction using Hinsage/Graphsage in StellarGraph returns NaNs

我正在尝试 运行 使用 stellargraph python 包中的 HinSAGE 进行 link 预测。

我有一个人与产品的网络,具有人与人 (KNOWs) 和人与产品 (BOUGHT) 的优势。 人和产品都附加了一个 属性 向量,尽管每种类型的向量不同(人向量是 1024,产品是 200)。 我正在尝试根据网络中的所有信息创建一个 link 从人到产品的预测算法。我使用 HinSAGE 的原因是归纳学习的选项。

我有下面的代码,我认为我做的与示例类似

https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/hinsage-link-prediction.html https://stellargraph.readthedocs.io/en/stable/demos/link-prediction/graphsage-link-prediction.html

但我的输出预测一直是“nan”,有人对我可以尝试的方法有什么建议吗?

import networkx as nx
import pandas as pd
import numpy as np
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification, link_regression
from sklearn.model_selection import train_test_split


graph.info()
#StellarGraph: Undirected multigraph
# Nodes: 54226, Edges: 259120
#
# Node types:
#  products: [45027]
#    Features: float32 vector, length 200
#    Edge types: products-BOUGHT->person
#  person: [9199]
#    Features: float32 vector, length 1024
#    Edge types: person-KNOWS->person, person-BOUGHT->product
#
# Edge types:
#    person-KNOWS->person: [246131]
#        Weights: all 1 (default)
#        Features: none
#    person-BOUGHT->product: [12989]
#        Weights: all 1 (default)
#        Features: none



import networkx as nx
import pandas as pd
import numpy as np
import os
import random
from tensorflow.keras import Model, optimizers, losses, metrics
import stellargraph as sg
from stellargraph.data import EdgeSplitter
from stellargraph.mapper import HinSAGELinkGenerator
from stellargraph.layer import HinSAGE, link_classification
from stellargraph.data import UniformRandomWalk
from stellargraph.data import UnsupervisedSampler
from sklearn.model_selection import train_test_split

from stellargraph.layer import HinSAGE, link_regression



edge_splitter_test = EdgeSplitter(graph)
graph_test, edges_test, labels_test = edge_splitter_test.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)
edge_splitter_train = EdgeSplitter(graph_test, graph)

graph_train, edges_train, labels_train = edge_splitter_train.train_test_split(
    p=0.1, method="global", edge_label="BOUGHT"
)


num_samples = [8, 4]

G = graph

batch_size = 20
epochs = 20


generator = HinSAGELinkGenerator(
    G, batch_size, num_samples, head_node_types=["person", "product"]
)
train_gen = generator.flow(edges_train, labels_train, shuffle=True)
test_gen = generator.flow(edges_test, labels_test)


hinsage_layer_sizes = [32, 32]
assert len(hinsage_layer_sizes) == len(num_samples)

hinsage = HinSAGE(
    layer_sizes=hinsage_layer_sizes, generator=generator, bias=True, dropout=0.0
)


# Expose input and output sockets of hinsage:
x_inp, x_out = hinsage.in_out_tensors()



    
# Final estimator layer
prediction = link_classification(
    output_dim=1, output_act="sigmoid", edge_embedding_method="concat"
)(x_out)

model = Model(inputs=x_inp, outputs=prediction)

model.compile(
    optimizer=optimizers.Adam(),
    loss=losses.binary_crossentropy,
    metrics=["acc"],
)

history = model.fit(train_gen, epochs=epochs, validation_data=test_gen, verbose=2)

所以我找到了问题,可能对其他人有用。如果有任何节点包含缺失数据,则该事物只会产生 NA。如果您通过加入 pandas 数据框来创建图形,则尤其危险,我在一个集成的文件中有一个拼写错误并导致了问题。