Amazon Redshift 执行 Hash Join，即使在加入 Dist Key 和 Sort Key 的列时也是如此

Amazon Redshift doing Hash Join even when joined on column that is both Dist Key and Sort Key

我有一个事实 table 在 Redshift 中有大约 13 亿行，分布键 c1 和排序键 c1、c2。

我需要使用 c1 上的连接子句将此 table 与其自身连接起来（即来自 table 的第一个实例的 c1 = 来自 table 的第二个实例的 c1）。

正如我看到查询的查询计划，Redshift 似乎正在使用 DS_DIST_NONE 进行哈希连接。虽然 DS_DIST_NONE 是预期的，因为我在 c1 列上同时具有 dist key 和 sort key，但我希望 Redshift 执行 Merge Join 而不是 Hash Join（同样是因为同样的原因）。

我认为这会减慢我的查询速度。

任何人都可以解释为什么 Redshift 可能正在执行 Hash Join 而不是 Merge Join（即使我在连接列上同时具有 DIST 键和 SORT 键）并且 Redshift 正在为 DS_DIST_NONE查询？

事实证明，由于我们没有按排序顺序（由 table 的排序键定义）在 table 中插入数据，而且 Redshift 不会自动保留 table的行按排序键排序，Redshift 无法在我们的 table 上执行 Merge Join。在运行 table 上的 Full Vacuum 之后，Redshift 开始执行 Merge Join

Amazon Redshift 执行 Hash Join，即使在加入 Dist Key 和 Sort Key 的列时也是如此

Amazon Redshift doing Hash Join even when joined on column that is both Dist Key and Sort Key

sql

join

amazon-web-services

hash

amazon-redshift