Neo4j 性能调优
Neo4j performance tuning
我是 Neo4j 的新手,目前我正在尝试将交友网站制作为 POC。我有 4GB 的输入文件,格式如下。
这包含 viewerId(male/female),viewedId,这是他们查看过的 ID 列表。根据这个历史文件,我需要在任何用户上线时给予推荐。
输入文件:
viewerId viewedId
12345 123456,23456,987653
23456 23456,123456,234567
34567 234567,765678,987653
:
为了这个任务,我尝试了以下方式,
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
UNWIND viewedIds AS viewedId
MERGE (p2:Persons2 {viewerId: row.viewerId})
MERGE (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2)
MERGE (c2)-[:Sees]->(p2);
我的 Cypher 查询得到的结果是,
MATCH (p2:Persons2)-[r*1..3]->(c2: Companies2)
RETURN p2,r, COLLECT(DISTINCT c2) as friends
完成此任务,需要 3 天时间。
我的系统配置:
Ubuntu -14.04
RAM -24GB
Neo4j 配置:
neo4j.properties:
neostore.nodestore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=2300M
neostore.propertystore.db.arrays.mapped_memory=5M
neostore.propertystore.db.strings.mapped_memory=3200M
neostore.relationshipstore.db.mapped_memory=800M
neo4j-wrapper.conf
wrapper.java.initmemory=12000
wrapper.java.maxmemory=12000
为了减少时间,我在互联网上搜索并从以下 link 获得了一个像 Batch importer 这样的想法,
https://github.com/jexp/batch-import
在 link 中,他们有 node.csv、rels.csv 个文件,他们导入了 Neo4j。我不知道他们是如何创建 node.csv 和 rels.csv 文件的,他们正在使用哪些脚本等等。
任何人都可以给我示例脚本来为我的数据制作 node.csv 和 rels.csv 文件吗?
或者您可以提出任何建议来加快导入和检索数据的速度吗?
提前致谢。
不需要反比关系,一个就够了!
对于导入,将堆 (neo4j-wrapper.conf) 配置为 12G,将页面缓存 (neo4j.properties) 配置为 10G。
试试这个,应该会在几分钟内完成。
create constraint on (p:Persons2) assert p.viewerId is unique;
create constraint on (p:Companies2) assert p.viewedId is unique;
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
MERGE (p2:Persons2 {viewerId: row.viewerId});
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
FOREACH (viewedId IN split(row.viewedId, ",") |
MERGE (c2:Companies2 {viewedId: viewedId}));
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2);
对于关系合并,如果你有一些公司有数十万到数百万的浏览量,你可能想改用这个:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
WHERE shortestPath((p2)-[:Friends]->(c2)) IS NULL
CREATE (p2)-[:Friends]->(c2);
关于您的查询?
你想通过检索所有人和所有公司之间最多 3 层深度的交叉产品来实现什么?这些可能是数万亿条路径?
通常您想为个人个人或公司了解这一点。
更新您的查询
Eg. For 123456, Persons who are all viewed this company is 12345,23456, then what are the companies these persons viewed 12345 123456,23456,987653 23456 23456,123456,234567 then I need to give recommendation to company -123456 as 23456,987653,23456,234567 Distinct of Result(Final Result) 23456,987653,234567
match (c:Companies2)<-[:Friends]-(p1:Persons2)-[:Friends]->(c2:Companies2)
where c.viewedId = 123456
return distinct c2.viewedId;
对于所有公司,这可能会有所帮助:
match (c:Companies2)<-[:Friends]-(p1:Persons2)
with p1, collect(c) as companies
match (p1)-[:Friends]->(c2:Companies2)
return c2.viewedId, extract(c in companies | c.viewedId);
我是 Neo4j 的新手,目前我正在尝试将交友网站制作为 POC。我有 4GB 的输入文件,格式如下。
这包含 viewerId(male/female),viewedId,这是他们查看过的 ID 列表。根据这个历史文件,我需要在任何用户上线时给予推荐。
输入文件:
viewerId viewedId
12345 123456,23456,987653
23456 23456,123456,234567
34567 234567,765678,987653
:
为了这个任务,我尝试了以下方式,
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
UNWIND viewedIds AS viewedId
MERGE (p2:Persons2 {viewerId: row.viewerId})
MERGE (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2)
MERGE (c2)-[:Sees]->(p2);
我的 Cypher 查询得到的结果是,
MATCH (p2:Persons2)-[r*1..3]->(c2: Companies2)
RETURN p2,r, COLLECT(DISTINCT c2) as friends
完成此任务,需要 3 天时间。
我的系统配置:
Ubuntu -14.04
RAM -24GB
Neo4j 配置:
neo4j.properties:
neostore.nodestore.db.mapped_memory=200M
neostore.propertystore.db.mapped_memory=2300M
neostore.propertystore.db.arrays.mapped_memory=5M
neostore.propertystore.db.strings.mapped_memory=3200M
neostore.relationshipstore.db.mapped_memory=800M
neo4j-wrapper.conf
wrapper.java.initmemory=12000
wrapper.java.maxmemory=12000
为了减少时间,我在互联网上搜索并从以下 link 获得了一个像 Batch importer 这样的想法, https://github.com/jexp/batch-import
在 link 中,他们有 node.csv、rels.csv 个文件,他们导入了 Neo4j。我不知道他们是如何创建 node.csv 和 rels.csv 文件的,他们正在使用哪些脚本等等。
任何人都可以给我示例脚本来为我的数据制作 node.csv 和 rels.csv 文件吗?
或者您可以提出任何建议来加快导入和检索数据的速度吗?
提前致谢。
不需要反比关系,一个就够了!
对于导入,将堆 (neo4j-wrapper.conf) 配置为 12G,将页面缓存 (neo4j.properties) 配置为 10G。
试试这个,应该会在几分钟内完成。
create constraint on (p:Persons2) assert p.viewerId is unique;
create constraint on (p:Companies2) assert p.viewedId is unique;
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
MERGE (p2:Persons2 {viewerId: row.viewerId});
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
FOREACH (viewedId IN split(row.viewedId, ",") |
MERGE (c2:Companies2 {viewedId: viewedId}));
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
MERGE (p2)-[:Friends]->(c2);
对于关系合并,如果你有一些公司有数十万到数百万的浏览量,你可能想改用这个:
USING PERIODIC COMMIT 10000
LOAD CSV WITH HEADERS FROM "file:/home/hadoopuser/Neo-input " AS row
FIELDTERMINATOR '\t'
WITH row, split(row.viewedId, ",") AS viewedIds
MATCH (p2:Persons2 {viewerId: row.viewerId})
UNWIND viewedIds AS viewedId
MATCH (c2:Companies2 {viewedId: viewedId})
WHERE shortestPath((p2)-[:Friends]->(c2)) IS NULL
CREATE (p2)-[:Friends]->(c2);
关于您的查询?
你想通过检索所有人和所有公司之间最多 3 层深度的交叉产品来实现什么?这些可能是数万亿条路径?
通常您想为个人个人或公司了解这一点。
更新您的查询
Eg. For 123456, Persons who are all viewed this company is 12345,23456, then what are the companies these persons viewed 12345 123456,23456,987653 23456 23456,123456,234567 then I need to give recommendation to company -123456 as 23456,987653,23456,234567 Distinct of Result(Final Result) 23456,987653,234567
match (c:Companies2)<-[:Friends]-(p1:Persons2)-[:Friends]->(c2:Companies2)
where c.viewedId = 123456
return distinct c2.viewedId;
对于所有公司,这可能会有所帮助:
match (c:Companies2)<-[:Friends]-(p1:Persons2)
with p1, collect(c) as companies
match (p1)-[:Friends]->(c2:Companies2)
return c2.viewedId, extract(c in companies | c.viewedId);