Neo4j 批量删除
Neo4j batch delete
我使用以下工具 class 进行 Neo4j 数据库清理:
public class Neo4jUtils {
final static Logger logger = LoggerFactory.getLogger(Neo4jUtils.class);
private static final int BATCH_SIZE = 1000;
public static void cleanDb(Neo4jTemplate template) {
logger.info("Cleaning database");
long deletedNodesCount = 0;
do {
GraphDatabaseService graphDatabaseService = template.getGraphDatabaseService();
Transaction tx = graphDatabaseService.beginTx();
try {
Result<Map<String, Object>> result = template.query("MATCH (n) WITH n LIMIT " + BATCH_SIZE + " OPTIONAL MATCH (n)-[r]-() DELETE n, r RETURN count(n) as count", null);
deletedNodesCount = (long) result.single().get("count");
tx.success();
logger.info("Deleted " + deletedNodesCount + " nodes...");
} catch (Throwable th) {
logger.error("Error while deleting database", th);
throw th;
} finally {
tx.close();
}
} while (deletedNodesCount > 0);
}
}
如您所见,我将批处理大小限制为 1000,但无论如何,在删除操作期间,第一批删除约 300000 个实体,其余批删除每批约 2000 个实体。
你能告诉我为什么在使用 BATCH_SIZE = 1000;
的情况下我有这些大数字吗?如何修复此功能以真正将批处理大小限制为 1000 个节点?
它可能重复计算节点,因为您与它们有多个关系。您的查询确实应该删除 1000 个节点,但您 return 组合数 (n,r)。
你可以:
更改您的查询以打印唯一节点:
MATCH (n) WITH n LIMIT 1000 OPTIONAL MATCH (n)-[r]-() DELETE n, r RETURN count(DISTINCT n) as count
或者打印每次删除后剩余的节点数,看是否比之前少了1000
MATCH (n) RETURN count(n) as count
我使用以下工具 class 进行 Neo4j 数据库清理:
public class Neo4jUtils {
final static Logger logger = LoggerFactory.getLogger(Neo4jUtils.class);
private static final int BATCH_SIZE = 1000;
public static void cleanDb(Neo4jTemplate template) {
logger.info("Cleaning database");
long deletedNodesCount = 0;
do {
GraphDatabaseService graphDatabaseService = template.getGraphDatabaseService();
Transaction tx = graphDatabaseService.beginTx();
try {
Result<Map<String, Object>> result = template.query("MATCH (n) WITH n LIMIT " + BATCH_SIZE + " OPTIONAL MATCH (n)-[r]-() DELETE n, r RETURN count(n) as count", null);
deletedNodesCount = (long) result.single().get("count");
tx.success();
logger.info("Deleted " + deletedNodesCount + " nodes...");
} catch (Throwable th) {
logger.error("Error while deleting database", th);
throw th;
} finally {
tx.close();
}
} while (deletedNodesCount > 0);
}
}
如您所见,我将批处理大小限制为 1000,但无论如何,在删除操作期间,第一批删除约 300000 个实体,其余批删除每批约 2000 个实体。
你能告诉我为什么在使用 BATCH_SIZE = 1000;
的情况下我有这些大数字吗?如何修复此功能以真正将批处理大小限制为 1000 个节点?
它可能重复计算节点,因为您与它们有多个关系。您的查询确实应该删除 1000 个节点,但您 return 组合数 (n,r)。
你可以:
更改您的查询以打印唯一节点:
MATCH (n) WITH n LIMIT 1000 OPTIONAL MATCH (n)-[r]-() DELETE n, r RETURN count(DISTINCT n) as count
或者打印每次删除后剩余的节点数,看是否比之前少了1000
MATCH (n) RETURN count(n) as count