优化py2neo的密码插入

Optimizing py2neo's cypher insertion

我正在使用py2neo 导入几十万个节点。我创建了一个 defaultdict 来将社区映射到城市。一个动机是更有效地导入这些关系

不成功

因为batch documentation suggests to avoid using it, I veered away from an implementation like the OP of this post. Instead the documentation suggests I use Cypher. However, I like the being able to create nodes from the defaultdict I have created. Plus, I found it too difficult importing this information as the 演示。

为了降低导入速度,我是否应该创建一个 Cypher transaction(并每 10,00 提交一次)而不是以下循环?

for city_name, neighborhood_names in city_neighborhood_map.iteritems():
     city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
         for neighborhood_name in neighborhood_names:
              neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
              rel = Relationship(neighborhood_node, "IN", city_node)
              graph.create(rel)

我超时了,当我执行以下操作时似乎很慢。即使当我分解事务以使其每 1,000 个社区提交一次时,它的处理速度仍然很慢。

tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
    for neighborhood_name in neighborhood_names:
        tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()

保存指向每个城市的指针会很棒,这样我就不需要每次合并时都查找它。

分两次执行此操作可能会更快,即 CREATE 所有节点首先具有唯一约束(这应该非常快),然后 CREATE 第二轮中的关系。

首先约束,使用标签 CityNeighborhood,更快 MATCH 稍后:

graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')

创建所有节点:

tx = graph.cypher.begin()

statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
    tx.append(statement, {"name": city_name})

statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
    tx.append(statement, {name: neighborhood_name})

tx.commit()

现在关系应该很快(由于constraints/index所以很快MATCH):

tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
    for neighborhood_name in neighborhood_names:
        tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})

tx.commit()