优化py2neo的密码插入
Optimizing py2neo's cypher insertion
我正在使用py2neo 导入几十万个节点。我创建了一个 defaultdict 来将社区映射到城市。一个动机是更有效地导入这些关系 。
不成功
因为batch documentation suggests to avoid using it, I veered away from an implementation like the OP of this post. Instead the documentation suggests I use Cypher. However, I like the being able to create nodes from the defaultdict I have created. Plus, I found it too difficult importing this information as the 演示。
为了降低导入速度,我是否应该创建一个 Cypher transaction(并每 10,00 提交一次)而不是以下循环?
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
for neighborhood_name in neighborhood_names:
neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
rel = Relationship(neighborhood_node, "IN", city_node)
graph.create(rel)
我超时了,当我执行以下操作时似乎很慢。即使当我分解事务以使其每 1,000 个社区提交一次时,它的处理速度仍然很慢。
tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()
保存指向每个城市的指针会很棒,这样我就不需要每次合并时都查找它。
分两次执行此操作可能会更快,即 CREATE
所有节点首先具有唯一约束(这应该非常快),然后 CREATE
第二轮中的关系。
首先约束,使用标签 City
和 Neighborhood
,更快 MATCH
稍后:
graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')
创建所有节点:
tx = graph.cypher.begin()
statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
tx.append(statement, {"name": city_name})
statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
tx.append(statement, {name: neighborhood_name})
tx.commit()
现在关系应该很快(由于constraints/index所以很快MATCH
):
tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()
我正在使用py2neo 导入几十万个节点。我创建了一个 defaultdict 来将社区映射到城市。一个动机是更有效地导入这些关系
因为batch documentation suggests to avoid using it, I veered away from an implementation like the OP of this post. Instead the documentation suggests I use Cypher. However, I like the being able to create nodes from the defaultdict I have created. Plus, I found it too difficult importing this information as the
为了降低导入速度,我是否应该创建一个 Cypher transaction(并每 10,00 提交一次)而不是以下循环?
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
city_node = graph.find_one(label="City", property_key="Name", property_value=city_name)
for neighborhood_name in neighborhood_names:
neighborhood_node = Node("Neighborhood", Name=neighborhood_name)
rel = Relationship(neighborhood_node, "IN", city_node)
graph.create(rel)
我超时了,当我执行以下操作时似乎很慢。即使当我分解事务以使其每 1,000 个社区提交一次时,它的处理速度仍然很慢。
tx = graph.cypher.begin()
statement = "MERGE (city {Name:{City_Name}}) CREATE (neighborhood { Name : {Neighborhood_Name}}) CREATE (neighborhood)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()
保存指向每个城市的指针会很棒,这样我就不需要每次合并时都查找它。
分两次执行此操作可能会更快,即 CREATE
所有节点首先具有唯一约束(这应该非常快),然后 CREATE
第二轮中的关系。
首先约束,使用标签 City
和 Neighborhood
,更快 MATCH
稍后:
graph.schema.create_uniqueness_constraint('City', 'Name')
graph.schema.create_uniqueness_constraint('Neighborhood', 'Name')
创建所有节点:
tx = graph.cypher.begin()
statement = "CREATE (:City {Name: {name}})"
for city_name in city_neighborhood_map.keys():
tx.append(statement, {"name": city_name})
statement = "CREATE (:Neighborhood {Name: {name}})"
for neighborhood_name in neighborhood_names: # get all neighborhood names for this
tx.append(statement, {name: neighborhood_name})
tx.commit()
现在关系应该很快(由于constraints/index所以很快MATCH
):
tx = graph.cypher.begin()
statement = "MATCH (city:City {Name: {City_Name}}), MATCH (n:Neighborhood {Name: {Neighborhood_Name}}) CREATE (n)-[:IN]->(city)"
for city_name, neighborhood_names in city_neighborhood_map.iteritems():
for neighborhood_name in neighborhood_names:
tx.append(statement, {"City_Name": city_name, "Neighborhood_Name": neighborhood_name})
tx.commit()