左侧 table 较小时 leftsemi 是否更快?

Is leftsemi faster with smaller table on the left side?

Join operator 文档说:

Tip

For best performance, if one table is always smaller than the other, use it as the left (piped) side of the join.

大多数情况下leftsemi的目的是用右边较小的集合过滤左边较大的集合。上面的引述是否仍然适用于 leftsemi 类型的连接运算符?

至少在这一点上,表格的顺序很重要。

这是在我的开发集群上执行的快速测试结果:

设置

.set-or-replace L100M <| range i from 1 to 100000000 step 1

.set-or-replace S1M <| range i from 1 to 1000000 step 1 | project i = tolong(rand(100000000))

rightsemi(先小table)

S1M | join kind=rightsemi L100M on i | consume

查询在大约 3 秒内完成


leftsemi(先大table)

L100M | join kind=leftsemi S1M on i | consume

查询运行大约 20 秒然后失败,出现以下异常:

Query execution lacks memory resources to complete (80DA0007): Partial query failure: Low memory condition (E_LOW_MEMORY_CONDITION). (message: 'bad allocation', details: '').