将 py2neo 路径解析为 Pandas
Parsing py2neo paths into Pandas
我们使用 py2neo
从 cypher
查询返回 paths
。我们想将结果解析为 Pandas DataFrame
。 cypher
查询类似于以下查询
query='''MATCH p=allShortestPaths(p1:Type1)-[r*..3]-(p2:Type1)
WHERE p1.ID =123456
RETURN distinct(p)''
result = graph.run(query)
生成的对象是一个 walkable 对象 - 可以遍历。应该注意 Nodes
和 Relationships
没有相同的属性。
迭代对象的最 pythonic
方法是什么?是否需要处理整个路径或者因为对象是字典是否可以使用 Pandas.from_dict
方法?有一个问题,有时路径的长度不相等。
目前我们正在枚举对象,如果它是一个不相等的对象,那么它是一个 Node , otherwise we process the object as a relationship.
for index, item in enumerate(paths):
if index%2 == 0:
#process as Node
else:
#process as Relationship
我们可以使用 isinstance
方法,即
if isinstance(item, py2neo.types.Node ):
#process as Node
但这仍然需要分别处理每个元素。
我解决问题如下:
我编写了一个函数来接收具有节点属性和关系的路径列表
def neo4j_graph_to_dict(paths, node_properties, rels_properties):
paths_dict=OrderedDict()
for (pathID, path) in enumerate(paths):
paths_dict[pathID]={}
for (i, node_rel) in enumerate(path):
n_properties = [node_rel[np] for np in node_properties]
r_properties = [node_rel[rp] for rp in rels_properties]
if isinstance(node_rel, Node):
node_fromat = [np+': {}|'for np in node_properties]
paths_dict[pathID]['Node'+str(i)]=('{}: '+' '.join(node_fromat)).format(list(node_rel.labels())[0], *n_properties)
elif isinstance(node_rel, Relationship):
rel_fromat = [np+': {}|'for np in rels_properties]
reltype= 'Rel'+str(i-1)
paths_dict[pathID][reltype]= ('{}: '+' '.join(rel_fromat)).format(node_rel.type(), *r_properties)
return paths_dict
假设查询returns路径、节点和关系我们可以运行下面的代码:
query='''MATCH paths=allShortestPaths(
(pr1:Type1 {ID:'123456'})-[r*1..9]-(pr2:Type2 {ID:'654321'}))
RETURN paths, nodes(paths) as nodes, rels(paths) as rels'''
df_qf = pd.DataFrame(graph.data(query))
node_properties = set([k for series in df_qf.nodes for node in series for k in node.keys() ]) # get unique values for Node properites
rels_properties = set([k for series in df_qf.rels for rel in series for k in rel.keys() ]) # get unique values for Rels properites
wg = [(walk(path)) for path in df_qf.paths ]
paths_dict = neo4j_graph_to_dict(wg, node_properties, rels_properties)
df = pd.DataFrame(paths_dict).transpose()
df = pd.DataFrame(df, columns=paths_dict[0].keys()).drop_duplicates()
我们使用 py2neo
从 cypher
查询返回 paths
。我们想将结果解析为 Pandas DataFrame
。 cypher
查询类似于以下查询
query='''MATCH p=allShortestPaths(p1:Type1)-[r*..3]-(p2:Type1)
WHERE p1.ID =123456
RETURN distinct(p)''
result = graph.run(query)
生成的对象是一个 walkable 对象 - 可以遍历。应该注意 Nodes
和 Relationships
没有相同的属性。
迭代对象的最 pythonic
方法是什么?是否需要处理整个路径或者因为对象是字典是否可以使用 Pandas.from_dict
方法?有一个问题,有时路径的长度不相等。
目前我们正在枚举对象,如果它是一个不相等的对象,那么它是一个 Node , otherwise we process the object as a relationship.
for index, item in enumerate(paths):
if index%2 == 0:
#process as Node
else:
#process as Relationship
我们可以使用 isinstance
方法,即
if isinstance(item, py2neo.types.Node ):
#process as Node
但这仍然需要分别处理每个元素。
我解决问题如下:
我编写了一个函数来接收具有节点属性和关系的路径列表
def neo4j_graph_to_dict(paths, node_properties, rels_properties):
paths_dict=OrderedDict()
for (pathID, path) in enumerate(paths):
paths_dict[pathID]={}
for (i, node_rel) in enumerate(path):
n_properties = [node_rel[np] for np in node_properties]
r_properties = [node_rel[rp] for rp in rels_properties]
if isinstance(node_rel, Node):
node_fromat = [np+': {}|'for np in node_properties]
paths_dict[pathID]['Node'+str(i)]=('{}: '+' '.join(node_fromat)).format(list(node_rel.labels())[0], *n_properties)
elif isinstance(node_rel, Relationship):
rel_fromat = [np+': {}|'for np in rels_properties]
reltype= 'Rel'+str(i-1)
paths_dict[pathID][reltype]= ('{}: '+' '.join(rel_fromat)).format(node_rel.type(), *r_properties)
return paths_dict
假设查询returns路径、节点和关系我们可以运行下面的代码:
query='''MATCH paths=allShortestPaths(
(pr1:Type1 {ID:'123456'})-[r*1..9]-(pr2:Type2 {ID:'654321'}))
RETURN paths, nodes(paths) as nodes, rels(paths) as rels'''
df_qf = pd.DataFrame(graph.data(query))
node_properties = set([k for series in df_qf.nodes for node in series for k in node.keys() ]) # get unique values for Node properites
rels_properties = set([k for series in df_qf.rels for rel in series for k in rel.keys() ]) # get unique values for Rels properites
wg = [(walk(path)) for path in df_qf.paths ]
paths_dict = neo4j_graph_to_dict(wg, node_properties, rels_properties)
df = pd.DataFrame(paths_dict).transpose()
df = pd.DataFrame(df, columns=paths_dict[0].keys()).drop_duplicates()