ArangoDB Parent to Child edge creation on existing 1 milltion documents for nested levels not working/or 慢
ArangoDB Parent to Child edge creation on existing 1 milltion docucments for nested leves not working/or SLOW
在 ArangoDB 中创建了 events
文档。加载 100 万条记录,如下所示,在 40 秒内完成。
FOR I IN 1..1000000
INSERT {
"source": "ABC",
"target": "ABC",
"type": "REST",
"attributes" : { "MyAtrib" : TO_STRING(I)},
"mynum" : I
} INTO events
所以记录 1 是超级父项,记录 2 是 1 的子项,等等。
1 --> 2 --> 3 --> 4 --> ...1000000
创建了空边集合 ChildEvents
,并尝试通过以下查询建立父子边关系,但从未完成(在 mynum
上创建了哈希索引,但没有成功)
FOR p IN events
FOR c IN events
FILTER p.mynum == ( c.mynum + 1 )
INSERT { _from: p._id, _to: c._id} INTO ChildEvents
如有任何帮助,我们将不胜感激。
在我的系统上创建事件文档大约需要 50 秒。我将 mynum
的索引添加到 events
集合中,并为你的第二个查询 运行 添加了一个索引(在最后添加了 RETURN NEW
),处理大约需要 70 秒边缘(加上一些时间来渲染它们的一个子集):
我在 Windows 10 下使用 A运行goDB 3.6.0 和 RocksDB 引擎,Intel i7-6700K 4x4.0 GHz,32 GB RAM,Samsung Evo 850 SSD。
您确定索引设置正确吗?解释查询并检查执行计划,也许您有什么不同?
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateCollectionNode 1000000 - FOR c IN events /* full collection scan, projections: `mynum`, `_id` */
9 IndexNode 1000000 - FOR p IN events /* persistent index scan, projections: `_id` */
6 CalculationNode 1000000 - LET #5 = { "_from" : p.`_id`, "_to" : c.`_id` } /* simple expression */ /* collections used: p : events, c : events */
7 InsertNode 1000000 - INSERT #5 IN ChildEvents
8 ReturnNode 1000000 - RETURN $NEW
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
9 idx_1655926293788622848 persistent events true false 100.00 % [ `mynum` ] (p.`mynum` == (c.`mynum` + 1))
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 interchange-adjacent-enumerations
4 move-calculations-up-2
5 move-filters-up-2
6 remove-data-modification-out-variables
7 use-indexes
8 remove-filter-covered-by-index
9 remove-unnecessary-calculations-2
10 reduce-extraction-to-projection
在 ArangoDB 中创建了 events
文档。加载 100 万条记录,如下所示,在 40 秒内完成。
FOR I IN 1..1000000
INSERT {
"source": "ABC",
"target": "ABC",
"type": "REST",
"attributes" : { "MyAtrib" : TO_STRING(I)},
"mynum" : I
} INTO events
所以记录 1 是超级父项,记录 2 是 1 的子项,等等。
1 --> 2 --> 3 --> 4 --> ...1000000
创建了空边集合 ChildEvents
,并尝试通过以下查询建立父子边关系,但从未完成(在 mynum
上创建了哈希索引,但没有成功)
FOR p IN events
FOR c IN events
FILTER p.mynum == ( c.mynum + 1 )
INSERT { _from: p._id, _to: c._id} INTO ChildEvents
如有任何帮助,我们将不胜感激。
在我的系统上创建事件文档大约需要 50 秒。我将 mynum
的索引添加到 events
集合中,并为你的第二个查询 运行 添加了一个索引(在最后添加了 RETURN NEW
),处理大约需要 70 秒边缘(加上一些时间来渲染它们的一个子集):
我在 Windows 10 下使用 A运行goDB 3.6.0 和 RocksDB 引擎,Intel i7-6700K 4x4.0 GHz,32 GB RAM,Samsung Evo 850 SSD。
您确定索引设置正确吗?解释查询并检查执行计划,也许您有什么不同?
Execution plan:
Id NodeType Est. Comment
1 SingletonNode 1 * ROOT
3 EnumerateCollectionNode 1000000 - FOR c IN events /* full collection scan, projections: `mynum`, `_id` */
9 IndexNode 1000000 - FOR p IN events /* persistent index scan, projections: `_id` */
6 CalculationNode 1000000 - LET #5 = { "_from" : p.`_id`, "_to" : c.`_id` } /* simple expression */ /* collections used: p : events, c : events */
7 InsertNode 1000000 - INSERT #5 IN ChildEvents
8 ReturnNode 1000000 - RETURN $NEW
Indexes used:
By Name Type Collection Unique Sparse Selectivity Fields Ranges
9 idx_1655926293788622848 persistent events true false 100.00 % [ `mynum` ] (p.`mynum` == (c.`mynum` + 1))
Optimization rules applied:
Id RuleName
1 move-calculations-up
2 move-filters-up
3 interchange-adjacent-enumerations
4 move-calculations-up-2
5 move-filters-up-2
6 remove-data-modification-out-variables
7 use-indexes
8 remove-filter-covered-by-index
9 remove-unnecessary-calculations-2
10 reduce-extraction-to-projection