错误 2646 Teradata:在 FULL 统计数据后在严重倾斜的列上重新分配

Error 2646 Teradata : Redistribution on a badly skewed column after FULL stats

有一段产品代码可以进行一些行检查。它实际上是迁移到 teradata 中的代码,我应该说,没有人费心将它更改为 TD 精明。 此代码现在抛出

2646 : No More spool...

错误,这并不是真正的线轴短缺,而是由于数据倾斜造成的,这对任何 Teradata Master 来说都是显而易见的。

代码逻辑很愚蠢,但他们运行在 Prod 中使用了它。现在不能更改代码,因为这是生产。我可以使用简单的 NOT Exists 重写它,查询将 运行 正常。

    EXPLAIN SELECT  ((COALESCE(FF.SKEW_COL,-99999)))  AS Cnt1,
        COUNT(*) AS Cnt 
 FROM   DB.10_BILLON_FACT FF 
WHERE   FF.SKEW_COL IN(
SELECT  F.SKEW_COL 
FROM    DB.10_BILLON_FACT F 

EXCEPT  
SELECT  D.DIM_COL 
FROM    DB.Smaller_DIM D



) 
 
 

它失败了,因为它想在 SKEW_COL 上重新分发。无论我做什么,这都不会改变。 SKEW_COL 99% 偏斜。

这是解释。步骤 # 4.1 失败

 This query is optimized using type 2 profile insert-sel, profileid
 10001.
  1) First, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.F.
  2) Next, we lock a distinct DB."pseudo table" for read on a
     RowHash to prevent global deadlock for DB.D.
  3) We lock DB.F for read, and we lock DB.D for read.
  4) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from DB.F by way of an
          all-rows scan with no residual conditions into Spool 6
          (all_amps), which is redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 6 by row hash and the sort key in
          spool field1 eliminating duplicate rows.  The size of Spool 6
          is estimated with low confidence to be 989,301 rows (
          28,689,729 bytes).  The estimated time for this step is 1
          minute and 36 seconds.
       2) We do an all-AMPs RETRIEVE step from DB.D by way of an
          all-rows scan with no residual conditions into Spool 7
          (all_amps), which is built locally on the AMPs.  Then we do a
          SORT to order Spool 7 by the hash code of (
          DB.D.DIM_COL).  The size of Spool 7 is
          estimated with low confidence to be 6,118,545 rows (
          177,437,805 bytes).  The estimated time for this step is 0.11
          seconds.
  5) We do an all-AMPs JOIN step from Spool 6 (Last Use) by way of an
     all-rows scan, which is joined to Spool 7 (Last Use) by way of an
     all-rows scan.  Spool 6 and Spool 7 are joined using an exclusion
     merge join, with a join condition of ("Field_1 = Field_1").  The
     result goes into Spool 1 (all_amps), which is built locally on the
     AMPs.  The size of Spool 1 is estimated with low confidence to be
     494,651 rows (14,344,879 bytes).  The estimated time for this step
     is 3.00 seconds.
  6) We execute the following steps in parallel.
       1) We do an all-AMPs RETRIEVE step from Spool 1 (Last Use) by
          way of an all-rows scan into Spool 5 (all_amps), which is
          redistributed by the hash code of (
          DB.F.SKEW_COL) to all AMPs.  Then we
          do a SORT to order Spool 5 by row hash.  The size of Spool 5
          is estimated with low confidence to be 494,651 rows (
          12,366,275 bytes).  The estimated time for this step is 0.13
          seconds.
       2) We do an all-AMPs RETRIEVE step from DB.FF by way of an
          all-rows scan with no residual conditions into Spool 8
          (all_amps) fanned out into 24 hash join partitions, which is
          built locally on the AMPs.  The size of Spool 8 is estimated
          with high confidence to be 2,603,284,805 rows (
          54,668,980,905 bytes).  The estimated time for this step is
          24.40 seconds.
  7) We do an all-AMPs RETRIEVE step from Spool 5 (Last Use) by way of
     an all-rows scan into Spool 9 (all_amps) fanned out into 24 hash
     join partitions, which is duplicated on all AMPs.  The size of
     Spool 9 is estimated with low confidence to be 249,304,104 rows (
     5,235,386,184 bytes).  The estimated time for this step is 1.55
     seconds.
  8) We do an all-AMPs JOIN step from Spool 8 (Last Use) by way of an
     all-rows scan, which is joined to Spool 9 (Last Use) by way of an
     all-rows scan.  Spool 8 and Spool 9 are joined using a inclusion
     hash join of 24 partitions, with a join condition of (
     "SKEW_COL = SKEW_COL").  The
     result goes into Spool 4 (all_amps), which is built locally on the
     AMPs.  The size of Spool 4 is estimated with index join confidence
     to be 1,630,304,007 rows (37,496,992,161 bytes).  The estimated
     time for this step is 11.92 seconds.
  9) We do an all-AMPs SUM step to aggregate from Spool 4 (Last Use) by
     way of an all-rows scan , grouping by field1 (
     DB.FF.SKEW_COL).  Aggregate Intermediate
     Results are computed globally, then placed in Spool 11.  The size
     of Spool 11 is estimated with low confidence to be 494,651 rows (
     14,344,879 bytes).  The estimated time for this step is 35.00
     seconds.
 10) We do an all-AMPs RETRIEVE step from Spool 11 (Last Use) by way of
     an all-rows scan into Spool 2 (group_amps), which is built locally
     on the AMPs.  The size of Spool 2 is estimated with low confidence
     to be 494,651 rows (16,323,483 bytes).  The estimated time for
     this step is 0.01 seconds.
 11) Finally, we send out an END TRANSACTION step to all AMPs involved
     in processing the request.
  -> The contents of Spool 2 are sent back to the user as the result of
     statement 1.  The total estimated time is 2 minutes and 52 seconds.

    

skewed_ 列和 * 有大约 900K 个唯一值(有趣的是 DIM_COL 有 600 万个唯一值,这就是为什么我认为它正在转向 Fact table 列。但是仍然..它从较大 table 中的低唯一值知道它严重倾斜) 我的问题是在知道 SKEWED_COL 由于像 -9999 这样的常量值而有 99% 偏斜之后为什么优化器仍然通过这个偏斜列重新分配而不是使用备用 PRPD 方法。过去发生过类似(但不相同)的情况,但当我们升级到更快的盒子(更多 AMPS)时,它就消失了。

任何想到的事情都会改变计划。我尝试了大多数诊断 - 没有结果。创建了一个 SI(在类似的 VT 上,但它仍然会倾斜)。SKEWING 是 inevitable,(你可以人为地更改数据 - 我知道这样可以最小化这个但所有这些都不是事后的。现在我们在 PROD 中。一切都结束了)但即使它知道 Col 是倾斜的,为什么在其他选项可用时重新分发它

偏斜的不是NULL值。它是一个常量标志值(可能是 NULL 的值代表,如 -9999,这导致了我在海报中提到的偏差)。如果你在我更新时重写 Q,它就可以正常工作。我更喜欢 NOT EXISTS,因为后者不需要 NULL CHECKING(作为一种实践,尽管从我的 DD 知识来看——我知道两个列都被声明为 NOT NULL)。我已经用一个可以工作的替代代码更新了海报(尽管就像我解释的那样 - 我最终确定了不存在的版本)

Select     count(*) , f.SKEW_COL 
from      ( 
select  ff.SKEW_COL 
from      DB.10_BILLON_FACT ff 
where   ff.SKEW_COL not in ( 
select  d.DIM_COL 
from      DB.Smaller_DIM d  )) as  f
Group   by f.SKEW_COL

我能否获得优化器查询重写功能来思考 Q 并使用上述逻辑重写。以上不会重新分配,而只是按倾斜的列排序

在您可以替换 SQL 之前,添加线轴可能是您唯一的选择。

确保您的统计信息是最新的,或者考虑使用包含此特定查询的替代 PI 的连接索引,而无需进行重新分发。您的 JI 可能有偏差,但如果可以在 AMP 本地完成工作,您也许可以解决假脱机问题。