左侧 table 较小时 leftsemi 是否更快?
Is leftsemi faster with smaller table on the left side?
Join operator 文档说:
Tip
For best performance, if one table is always smaller than the
other, use it as the left (piped) side of the join.
大多数情况下leftsemi的目的是用右边较小的集合过滤左边较大的集合。上面的引述是否仍然适用于 leftsemi
类型的连接运算符?
至少在这一点上,表格的顺序很重要。
这是在我的开发集群上执行的快速测试结果:
设置
.set-or-replace L100M <| range i from 1 to 100000000 step 1
.set-or-replace S1M <| range i from 1 to 1000000 step 1 | project i = tolong(rand(100000000))
rightsemi(先小table)
S1M | join kind=rightsemi L100M on i | consume
查询在大约 3 秒内完成
leftsemi(先大table)
L100M | join kind=leftsemi S1M on i | consume
查询运行大约 20 秒然后失败,出现以下异常:
Query execution lacks memory resources to complete (80DA0007): Partial
query failure: Low memory condition (E_LOW_MEMORY_CONDITION).
(message: 'bad allocation', details: '').
Join operator 文档说:
Tip
For best performance, if one table is always smaller than the other, use it as the left (piped) side of the join.
大多数情况下leftsemi的目的是用右边较小的集合过滤左边较大的集合。上面的引述是否仍然适用于 leftsemi
类型的连接运算符?
至少在这一点上,表格的顺序很重要。
这是在我的开发集群上执行的快速测试结果:
设置
.set-or-replace L100M <| range i from 1 to 100000000 step 1
.set-or-replace S1M <| range i from 1 to 1000000 step 1 | project i = tolong(rand(100000000))
rightsemi(先小table)
S1M | join kind=rightsemi L100M on i | consume
查询在大约 3 秒内完成
leftsemi(先大table)
L100M | join kind=leftsemi S1M on i | consume
查询运行大约 20 秒然后失败,出现以下异常:
Query execution lacks memory resources to complete (80DA0007): Partial query failure: Low memory condition (E_LOW_MEMORY_CONDITION). (message: 'bad allocation', details: '').