Teradata SQL 优化
Teradata SQL Optimization
我希望这是简洁的。在看到我的一位同事通过快速更改将我的查询速度提高了近 10 倍后,我基本上是在寻找一种方法来改进查询
我的查询有两个表 t_item
和 t_action
t_item
基本上是一个具有特征的项目,t_action
是对这个项目执行的事件或动作,每个动作都有一个时间戳,每个动作也有一个 id
我的查询在 id 上连接了两个表。 t_action.action_type
也有一些标准是自由文本
我的简化原始查询如下所示
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
WHERE t_action.action_type LIKE ('%PURCHASE%')
AND t_item.location = 'DE'
这个运行好的,大概8分钟就回来了
我的同事更改了它,使 t_action.action_type
最终出现在 SQL 的 FROM 部分。这将时间减少到 2 分钟
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE ('%PURCHASE%')
WHERE t_item.location = 'DE'
我的问题是,通常,您如何知道何时在 FROM 子句和 WHERE 子句中设置限制。
我认为 Teradata SQL 优化器会自动执行此操作
感谢您的帮助
在这种情况下,您实际上不需要了解该计划。你只需要看看这两个计划是否相同。 Teradata 有一个非常好的优化器,所以我不希望这两个版本之间存在差异(可能是,但我会感到惊讶)。因此,缓存是解释性能差异的一种可能性。
对于这个查询:
SELECT *
FROM t_item JOIN
t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE '%PURCHASE%'
WHERE t_item.location = 'DE';
最好的索引可能在 t_item(location, pk)
和 t_action(action_type)
上。但是,您应该尝试摆脱生产查询的通配符。这使得查询更难优化,进而可能对性能产生很大影响。
我尝试创建类似的查询,但在解释计划中没有发现任何差异..虽然记录计数较少 trans(15k) 和 accounts(10k),索引在 Account_number。可能是 Gordon 指定的,尝试 运行 在不同时间查询,并检查两个查询的解释计划以查看任何差异。
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
where t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
and t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Teradata 上查询处理的一般顺序是:
Where/And + 加入
汇总
拥有
Olap/Window
晋级
Sample/Top
排序依据
格式
一种容易记住的方法是 WAHOQSOF - 就像 Wax On,Wax Off :)
我希望这是简洁的。在看到我的一位同事通过快速更改将我的查询速度提高了近 10 倍后,我基本上是在寻找一种方法来改进查询
我的查询有两个表 t_item
和 t_action
t_item
基本上是一个具有特征的项目,t_action
是对这个项目执行的事件或动作,每个动作都有一个时间戳,每个动作也有一个 id
我的查询在 id 上连接了两个表。 t_action.action_type
也有一些标准是自由文本
我的简化原始查询如下所示
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
WHERE t_action.action_type LIKE ('%PURCHASE%')
AND t_item.location = 'DE'
这个运行好的,大概8分钟就回来了
我的同事更改了它,使 t_action.action_type
最终出现在 SQL 的 FROM 部分。这将时间减少到 2 分钟
SELECT *
FROM t_item
JOIN t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE ('%PURCHASE%')
WHERE t_item.location = 'DE'
我的问题是,通常,您如何知道何时在 FROM 子句和 WHERE 子句中设置限制。
我认为 Teradata SQL 优化器会自动执行此操作
感谢您的帮助
在这种情况下,您实际上不需要了解该计划。你只需要看看这两个计划是否相同。 Teradata 有一个非常好的优化器,所以我不希望这两个版本之间存在差异(可能是,但我会感到惊讶)。因此,缓存是解释性能差异的一种可能性。
对于这个查询:
SELECT *
FROM t_item JOIN
t_action
ON t_item.pk = t_action.fk
t_action.action_type LIKE '%PURCHASE%'
WHERE t_item.location = 'DE';
最好的索引可能在 t_item(location, pk)
和 t_action(action_type)
上。但是,您应该尝试摆脱生产查询的通配符。这使得查询更难优化,进而可能对性能产生很大影响。
我尝试创建类似的查询,但在解释计划中没有发现任何差异..虽然记录计数较少 trans(15k) 和 accounts(10k),索引在 Account_number。可能是 Gordon 指定的,尝试 运行 在不同时间查询,并检查两个查询的解释计划以查看任何差异。
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
where t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Explain select * from trans t
inner join
ap.accounts a
on t.account_number = a.account_number
and t.trans_id like '%DEP%';
4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
scan with no residual conditions, which is joined to ap.t by way
of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
'%DEP%'"). ap.a and ap.t are joined using a merge join, with a
join condition of ("ap.t.Account_Number = ap.a.Account_Number").
The result goes into Spool 1 (group_amps), which is built locally
on the AMPs. The size of Spool 1 is estimated with no confidence
to be 11,996 rows (1,511,496 bytes). The estimated time for this
step is 0.25 seconds.
-> The contents of Spool 1 are sent back to the user as the result of
statement 1. The total estimated time is 0.25 seconds.
Teradata 上查询处理的一般顺序是:
Where/And + 加入
汇总
拥有
Olap/Window
晋级
Sample/Top
排序依据
格式
一种容易记住的方法是 WAHOQSOF - 就像 Wax On,Wax Off :)