Teradata SQL 优化

Question

我希望这是简洁的。在看到我的一位同事通过快速更改将我的查询速度提高了近 10 倍后，我基本上是在寻找一种方法来改进查询

我的查询有两个表 t_item 和 t_action t_item 基本上是一个具有特征的项目，t_action 是对这个项目执行的事件或动作，每个动作都有一个时间戳，每个动作也有一个 id

我的查询在 id 上连接了两个表。 t_action.action_type 也有一些标准是自由文本

我的简化原始查询如下所示

SELECT * 
FROM t_item
  JOIN t_action
  ON t_item.pk = t_action.fk
WHERE t_action.action_type LIKE ('%PURCHASE%')
AND t_item.location = 'DE'

这个运行好的，大概8分钟就回来了

我的同事更改了它，使 t_action.action_type 最终出现在 SQL 的 FROM 部分。这将时间减少到 2 分钟

SELECT * 
FROM t_item
  JOIN t_action
  ON t_item.pk = t_action.fk
  t_action.action_type LIKE ('%PURCHASE%')
WHERE t_item.location = 'DE'

我的问题是，通常，您如何知道何时在 FROM 子句和 WHERE 子句中设置限制。

我认为 Teradata SQL 优化器会自动执行此操作

感谢您的帮助

Answer 1

在这种情况下，您实际上不需要了解该计划。你只需要看看这两个计划是否相同。 Teradata 有一个非常好的优化器，所以我不希望这两个版本之间存在差异（可能是，但我会感到惊讶）。因此，缓存是解释性能差异的一种可能性。

对于这个查询：

SELECT * 
FROM t_item JOIN
     t_action
     ON t_item.pk = t_action.fk
        t_action.action_type LIKE '%PURCHASE%'
WHERE t_item.location = 'DE';

最好的索引可能在 t_item(location, pk) 和 t_action(action_type) 上。但是，您应该尝试摆脱生产查询的通配符。这使得查询更难优化，进而可能对性能产生很大影响。

Answer 2

我尝试创建类似的查询，但在解释计划中没有发现任何差异..虽然记录计数较少 trans(15k) 和 accounts(10k)，索引在 Account_number。可能是 Gordon 指定的，尝试运行在不同时间查询，并检查两个查询的解释计划以查看任何差异。

Explain select * from trans t
inner join 
ap.accounts a
on t.account_number = a.account_number
where  t.trans_id like '%DEP%';


  4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
     scan with no residual conditions, which is joined to ap.t by way
     of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
     '%DEP%'").  ap.a and ap.t are joined using a merge join, with a
     join condition of ("ap.t.Account_Number = ap.a.Account_Number").
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 11,996 rows (1,511,496 bytes).  The estimated time for this
     step is 0.25 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.25 seconds.


Explain select * from trans t
inner join 
ap.accounts a
on t.account_number = a.account_number
and t.trans_id like '%DEP%';


  4) We do an all-AMPs JOIN step from ap.a by way of a RowHash match
     scan with no residual conditions, which is joined to ap.t by way
     of a RowHash match scan with a condition of ("ap.t.Trans_ID LIKE
     '%DEP%'").  ap.a and ap.t are joined using a merge join, with a
     join condition of ("ap.t.Account_Number = ap.a.Account_Number").
     The result goes into Spool 1 (group_amps), which is built locally
     on the AMPs.  The size of Spool 1 is estimated with no confidence
     to be 11,996 rows (1,511,496 bytes).  The estimated time for this
     step is 0.25 seconds.
  -> The contents of Spool 1 are sent back to the user as the result of
     statement 1.  The total estimated time is 0.25 seconds.

Answer 3

Teradata 上查询处理的一般顺序是：

Where/And + 加入
汇总
拥有
Olap/Window
晋级
Sample/Top
排序依据
格式

一种容易记住的方法是 WAHOQSOF - 就像 Wax On，Wax Off :)

Teradata SQL 优化

Teradata SQL Optimization

sql

query-optimization

teradata