将 py2neo 路径解析为 Pandas

Parsing py2neo paths into Pandas

我们使用 py2neocypher 查询返回 paths。我们想将结果解析为 Pandas DataFramecypher 查询类似于以下查询

query='''MATCH p=allShortestPaths(p1:Type1)-[r*..3]-(p2:Type1)
WHERE p1.ID =123456
RETURN distinct(p)''
result = graph.run(query)

生成的对象是一个 walkable 对象 - 可以遍历。应该注意 NodesRelationships 没有相同的属性。
迭代对象的最 pythonic 方法是什么?是否需要处理整个路径或者因为对象是字典是否可以使用 Pandas.from_dict 方法?有一个问题,有时路径的长度不相等。
目前我们正在枚举对象,如果它是一个不相等的对象,那么它是一个 Node , otherwise we process the object as a relationship.

for index, item in enumerate(paths):
  if index%2 == 0:
    #process as Node
  else:
    #process as Relationship

我们可以使用 isinstance 方法,即

 if isinstance(item, py2neo.types.Node ):
   #process as Node

但这仍然需要分别处理每个元素。

我解决问题如下:
我编写了一个函数来接收具有节点属性和关系的路径列表

def neo4j_graph_to_dict(paths, node_properties, rels_properties):   
    paths_dict=OrderedDict()
    for (pathID, path) in enumerate(paths):
        paths_dict[pathID]={}
        for (i, node_rel) in enumerate(path):
            n_properties = [node_rel[np] for np in node_properties]
            r_properties = [node_rel[rp] for rp in rels_properties]
            if isinstance(node_rel, Node):
                node_fromat = [np+': {}|'for np in node_properties]
                paths_dict[pathID]['Node'+str(i)]=('{}: '+' '.join(node_fromat)).format(list(node_rel.labels())[0], *n_properties)                
            elif isinstance(node_rel, Relationship):
                rel_fromat = [np+': {}|'for np in rels_properties]
                reltype= 'Rel'+str(i-1)
                paths_dict[pathID][reltype]= ('{}: '+' '.join(rel_fromat)).format(node_rel.type(), *r_properties)
    return paths_dict 

假设查询returns路径、节点和关系我们可以运行下面的代码:

query='''MATCH paths=allShortestPaths(
    (pr1:Type1 {ID:'123456'})-[r*1..9]-(pr2:Type2 {ID:'654321'}))  
    RETURN paths, nodes(paths) as nodes, rels(paths) as rels'''  

df_qf = pd.DataFrame(graph.data(query))
node_properties = set([k for series in df_qf.nodes for node in series for k in node.keys() ]) # get unique values for Node properites
rels_properties = set([k for series in df_qf.rels for rel in series for k in rel.keys() ]) # get unique values for Rels properites
wg = [(walk(path))  for path in df_qf.paths ]
paths_dict = neo4j_graph_to_dict(wg, node_properties, rels_properties)
df = pd.DataFrame(paths_dict).transpose()
df = pd.DataFrame(df, columns=paths_dict[0].keys()).drop_duplicates()