Redshift 查询执行计划
Redshift Query Execution Plan
我注意到下面的查询运行缓慢,在详细查看之后,我想知道为什么 Redshift 首先分别扫描两个 tables(事件和联系人),然后将它们连接在一起。联系人 table 中有超过 300,000 行。
我的期望是 Redshift 应该首先根据为其指定的过滤器扫描大型事件 table,然后根据 Contact_IDs 列在其中找到联系人。我的期望不正确吗?我还能做些什么来加快查询速度吗?我对所有 table 执行了 Vacuum 和 Analyze。
查询:
select c.Segment
, Count (Distinct (CASE WHEN et.Event_ID = 1 THEN et.Contact_ID ELSE null END)) as L1
, Count (Distinct (CASE WHEN et.Event_ID = 2 THEN et.Contact_ID ELSE null END)) as L2
from
Events et
jon contact c on c.Account_ID = et.Account_ID and c.ID = et.Contact_ID
where
et.Account_ID = 5
and et.Event_ID in (1, 2)
and et.IsGuest = 0
and et.dim_date_id >=20151125
and et.dim_date_id <=20160226
group by c.Segment
order by 1
说明:
XN Merge (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN Network (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN Sort (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN HashAggregate (cost=74927.80..74927.81 rows=1 width=20)
-> XN Merge Join DS_DIST_NONE (cost=0.00..74927.57 rows=31 width=20)
-> XN Seq Scan on contact c (cost=0.00..497.56 rows=39805 width=16)
-> XN Seq Scan on eventtransaction et (cost=0.00..6664.84 rows=136 width=20)
仅在执行联接后应用过滤器。如果您希望在应用过滤器后加入,我建议您创建一个临时 table 并将其与您在代码中指示的联系人 table 加入。
select c.Segment
, Count (Distinct (CASE WHEN et.Event_ID = 1 THEN et.Contact_ID ELSE null END)) as L1
, Count (Distinct (CASE WHEN et.Event_ID = 2 THEN et.Contact_ID ELSE null END)) as L2
from
(
select Event_ID, Account_ID, Contact_ID
FROM event
WHERE
et.Account_ID = 5
and et.Event_ID in (1, 2)
and et.IsGuest = 0
and et.dim_date_id >=20151125
and et.dim_date_id <=20160226
)et
join contact c on c.Account_ID = et.Account_ID and c.ID = et.Contact_ID
group by c.Segment
order by 1
此外,如果您在 dim_date_id
上设置了排序键,您会发现此查询的速度得到了显着提高。可以找到有关相同内容的更多详细信息 here
我注意到下面的查询运行缓慢,在详细查看之后,我想知道为什么 Redshift 首先分别扫描两个 tables(事件和联系人),然后将它们连接在一起。联系人 table 中有超过 300,000 行。 我的期望是 Redshift 应该首先根据为其指定的过滤器扫描大型事件 table,然后根据 Contact_IDs 列在其中找到联系人。我的期望不正确吗?我还能做些什么来加快查询速度吗?我对所有 table 执行了 Vacuum 和 Analyze。
查询:
select c.Segment
, Count (Distinct (CASE WHEN et.Event_ID = 1 THEN et.Contact_ID ELSE null END)) as L1
, Count (Distinct (CASE WHEN et.Event_ID = 2 THEN et.Contact_ID ELSE null END)) as L2
from
Events et
jon contact c on c.Account_ID = et.Account_ID and c.ID = et.Contact_ID
where
et.Account_ID = 5
and et.Event_ID in (1, 2)
and et.IsGuest = 0
and et.dim_date_id >=20151125
and et.dim_date_id <=20160226
group by c.Segment
order by 1
说明:
XN Merge (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN Network (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN Sort (cost=1000000074927.82..1000000074927.83 rows=1 width=20)
-> XN HashAggregate (cost=74927.80..74927.81 rows=1 width=20)
-> XN Merge Join DS_DIST_NONE (cost=0.00..74927.57 rows=31 width=20)
-> XN Seq Scan on contact c (cost=0.00..497.56 rows=39805 width=16)
-> XN Seq Scan on eventtransaction et (cost=0.00..6664.84 rows=136 width=20)
仅在执行联接后应用过滤器。如果您希望在应用过滤器后加入,我建议您创建一个临时 table 并将其与您在代码中指示的联系人 table 加入。
select c.Segment
, Count (Distinct (CASE WHEN et.Event_ID = 1 THEN et.Contact_ID ELSE null END)) as L1
, Count (Distinct (CASE WHEN et.Event_ID = 2 THEN et.Contact_ID ELSE null END)) as L2
from
(
select Event_ID, Account_ID, Contact_ID
FROM event
WHERE
et.Account_ID = 5
and et.Event_ID in (1, 2)
and et.IsGuest = 0
and et.dim_date_id >=20151125
and et.dim_date_id <=20160226
)et
join contact c on c.Account_ID = et.Account_ID and c.ID = et.Contact_ID
group by c.Segment
order by 1
此外,如果您在 dim_date_id
上设置了排序键,您会发现此查询的速度得到了显着提高。可以找到有关相同内容的更多详细信息 here