使用 Py2Neo 加速在 Neo4j 节点之间创建边
Speeding Up creation of edges between Node in Neo4j using Py2Neo
我正在尝试在 neo4j 中创建一个巨大的数据库,它将拥有大约 200 万个节点和大约 400 万个边缘。通过以每批 1000 个节点的形式创建节点,我已经能够加快节点创建过程。但是,当我尝试在这些节点之间创建边时,过程变慢然后超时。最初我虽然它可能会很慢,因为我是在节点名称的基础上合并的,但即使我使用 id,它也会变慢 - 我已经手动创建了这些 id。为了更好地理解问题,我在下面给出了数据和代码的片段 -
Node.csv - 此文件包含有关节点的详细信息
NodeName NodeType NodeId
Sachin Person 1
UNO Organisation 2
Obama Person 3
Cricket Sports 4
Tennis Sports 5
USA Place 6
India Place 7
Edges.csv - 此文件仅包含节点 ID 及其关系
Node1Id Relationship Node2Id
1 Plays 4
3 PresidentOf 6
1 CitizenOf 7
创建节点的代码如下-
from py2neo import Graph
graph = Graph()
statement =""
tx = graph.cypher.begin()
for i in range(len(Node)):
statement = "Create(n{name:{A} ,label:{C}, id:{B}})"
tx.append(statement,{"A": Node[i][0],"C":str(Node[i][1]), "B":str(Node[i][2])})
if i % 1000 == 0:
print str(i) + "Node Created"
tx.commit()
tx = self.graph.cypher.begin()
statement =""
以上代码神似,5分钟就完成了200万个节点的创建。
下面给出了创建边的代码-
tx = graph.cypher.begin()
statement = ""
for i in range(len(Edges)):
statement = "MATCH (a {id:{A}}), (b {id:{B}}) CREATE (a)-[:"+ Edges[i][1]+"]->(b)"
tx.append(statement, {"A": Edges[i][0], "B": Edges[i][2]})
if i % 1000 == 0:
print str(i) + " Relationship Created"
tx.commit()
tx = graph.cypher.begin()
statement = ""
上面的代码可以很好地创建前 1000 个关系,但之后会花费大量时间并且连接会超时。
我急需解决这个问题,任何可以加快关系建立过程的帮助都会非常有帮助。
请注意 - 我没有使用 Neo4j 的导入 csv 或 Neo4j shell 导入,因为这些假设节点之间的关系是固定的。而对我来说,关系各不相同,一次导入一个关系是不可行的,因为这意味着手动导入近 2000 次。
您忘记为节点使用标签,然后在标签 + id 上创建约束。
create constraint on (o:Organization) assert o.id is unique;
create constraint on (p:Person) assert p.id is unique;
Create(n:Person {name:{A} ,id:{B}})
Create(n:Organization {name:{A} ,id:{B}})
match (p:Person {id:{p_Iid}), (o:Organization {id:{o_id}})
create (p)-[:WORKS_FOR]->(o);
这是代码的更新版本,因为 py2neo (v3+) 事务中的很多内容已被弃用。该代码还包括 Michaels 解决方案。
节点:
def insert_nodes_to_neodb():
queries_per_transaction = 1000 # 1000 seems to work faster
node_path = "path_to_csv"
graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
trans_action = graph.begin()
with open(node_path) as csvfile:
next(csvfile) # labels of the columns
node_csv = csv.reader(csvfile, delimiter=',')
for idx, row in enumerate(node_csv):
statement = "CREATE (n:Entity {id:{A} , label:{B}, name:{B}})" # name is for visual reasons (neo4j)
trans_action.run(statement, {"A": row[0], "B": row[1]})
if idx % queries_per_transaction == 0:
trans_action.commit()
trans_action = graph.begin()
# Commit the left over queries
trans_action.commit()
# We need to create indexes on a separate transaction
# neo4j.exceptions.Forbidden: Cannot perform schema updates in a transaction that has performed data updates.
trans_action = graph.begin()
trans_action.run("CREATE CONSTRAINT ON (o:Entity) ASSERT o.id IS UNIQUE;")
trans_action.commit()
边缘:
def insert_edges_to_neodb(neo_graph):
queries_per_transaction = 1000 # 1000 seems to work faster
edge_path = "path_to_csv"
graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
trans_action = graph.begin()
with open(edge_path) as csvfile:
next(csvfile) # labels of the columns
edge_csv = csv.reader(csvfile, delimiter=',')
for idx, row in enumerate(edge_csv):
statement = """MATCH (a:Entity),(b:Entity)
WHERE a.id = {A} AND b.id = {B}
CREATE (a)-[r:CO_APPEARS { name: a.name + '<->' + b.name, weight: {C} }]->(b)
RETURN type(r), r.name"""
trans_action.run(statement, {"A": row[0], "B": row[1], "C": row[2]})
if idx % queries_per_transaction == 0:
trans_action.commit()
trans_action = graph.begin()
trans_action.commit()
我正在尝试在 neo4j 中创建一个巨大的数据库,它将拥有大约 200 万个节点和大约 400 万个边缘。通过以每批 1000 个节点的形式创建节点,我已经能够加快节点创建过程。但是,当我尝试在这些节点之间创建边时,过程变慢然后超时。最初我虽然它可能会很慢,因为我是在节点名称的基础上合并的,但即使我使用 id,它也会变慢 - 我已经手动创建了这些 id。为了更好地理解问题,我在下面给出了数据和代码的片段 -
Node.csv - 此文件包含有关节点的详细信息
NodeName NodeType NodeId Sachin Person 1 UNO Organisation 2 Obama Person 3 Cricket Sports 4 Tennis Sports 5 USA Place 6 India Place 7
Edges.csv - 此文件仅包含节点 ID 及其关系
Node1Id Relationship Node2Id 1 Plays 4 3 PresidentOf 6 1 CitizenOf 7
创建节点的代码如下-
from py2neo import Graph
graph = Graph()
statement =""
tx = graph.cypher.begin()
for i in range(len(Node)):
statement = "Create(n{name:{A} ,label:{C}, id:{B}})"
tx.append(statement,{"A": Node[i][0],"C":str(Node[i][1]), "B":str(Node[i][2])})
if i % 1000 == 0:
print str(i) + "Node Created"
tx.commit()
tx = self.graph.cypher.begin()
statement =""
以上代码神似,5分钟就完成了200万个节点的创建。 下面给出了创建边的代码-
tx = graph.cypher.begin()
statement = ""
for i in range(len(Edges)):
statement = "MATCH (a {id:{A}}), (b {id:{B}}) CREATE (a)-[:"+ Edges[i][1]+"]->(b)"
tx.append(statement, {"A": Edges[i][0], "B": Edges[i][2]})
if i % 1000 == 0:
print str(i) + " Relationship Created"
tx.commit()
tx = graph.cypher.begin()
statement = ""
上面的代码可以很好地创建前 1000 个关系,但之后会花费大量时间并且连接会超时。
我急需解决这个问题,任何可以加快关系建立过程的帮助都会非常有帮助。
请注意 - 我没有使用 Neo4j 的导入 csv 或 Neo4j shell 导入,因为这些假设节点之间的关系是固定的。而对我来说,关系各不相同,一次导入一个关系是不可行的,因为这意味着手动导入近 2000 次。
您忘记为节点使用标签,然后在标签 + id 上创建约束。
create constraint on (o:Organization) assert o.id is unique;
create constraint on (p:Person) assert p.id is unique;
Create(n:Person {name:{A} ,id:{B}})
Create(n:Organization {name:{A} ,id:{B}})
match (p:Person {id:{p_Iid}), (o:Organization {id:{o_id}})
create (p)-[:WORKS_FOR]->(o);
这是代码的更新版本,因为 py2neo (v3+) 事务中的很多内容已被弃用。该代码还包括 Michaels 解决方案。
节点:
def insert_nodes_to_neodb():
queries_per_transaction = 1000 # 1000 seems to work faster
node_path = "path_to_csv"
graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
trans_action = graph.begin()
with open(node_path) as csvfile:
next(csvfile) # labels of the columns
node_csv = csv.reader(csvfile, delimiter=',')
for idx, row in enumerate(node_csv):
statement = "CREATE (n:Entity {id:{A} , label:{B}, name:{B}})" # name is for visual reasons (neo4j)
trans_action.run(statement, {"A": row[0], "B": row[1]})
if idx % queries_per_transaction == 0:
trans_action.commit()
trans_action = graph.begin()
# Commit the left over queries
trans_action.commit()
# We need to create indexes on a separate transaction
# neo4j.exceptions.Forbidden: Cannot perform schema updates in a transaction that has performed data updates.
trans_action = graph.begin()
trans_action.run("CREATE CONSTRAINT ON (o:Entity) ASSERT o.id IS UNIQUE;")
trans_action.commit()
边缘:
def insert_edges_to_neodb(neo_graph):
queries_per_transaction = 1000 # 1000 seems to work faster
edge_path = "path_to_csv"
graph = Graph("bolt://localhost:7687", user="neo4j", password="pswd")
trans_action = graph.begin()
with open(edge_path) as csvfile:
next(csvfile) # labels of the columns
edge_csv = csv.reader(csvfile, delimiter=',')
for idx, row in enumerate(edge_csv):
statement = """MATCH (a:Entity),(b:Entity)
WHERE a.id = {A} AND b.id = {B}
CREATE (a)-[r:CO_APPEARS { name: a.name + '<->' + b.name, weight: {C} }]->(b)
RETURN type(r), r.name"""
trans_action.run(statement, {"A": row[0], "B": row[1], "C": row[2]})
if idx % queries_per_transaction == 0:
trans_action.commit()
trans_action = graph.begin()
trans_action.commit()