左侧 table 较小时 leftsemi 是否更快？

Question

Tip

For best performance, if one table is always smaller than the other, use it as the left (piped) side of the join.

大多数情况下leftsemi的目的是用右边较小的集合过滤左边较大的集合。上面的引述是否仍然适用于 leftsemi 类型的连接运算符？

Answer 1

至少在这一点上，表格的顺序很重要。

这是在我的开发集群上执行的快速测试结果：

设置

.set-or-replace L100M <| range i from 1 to 100000000 step 1

.set-or-replace S1M <| range i from 1 to 1000000 step 1 | project i = tolong(rand(100000000))

rightsemi（先小table）

S1M | join kind=rightsemi L100M on i | consume

查询在大约 3 秒内完成

leftsemi（先大table）

L100M | join kind=leftsemi S1M on i | consume

查询运行大约 20 秒然后失败，出现以下异常：

Query execution lacks memory resources to complete (80DA0007): Partial query failure: Low memory condition (E_LOW_MEMORY_CONDITION). (message: 'bad allocation', details: '').

左侧 table 较小时 leftsemi 是否更快？

Is leftsemi faster with smaller table on the left side?

kql

azure-data-explorer