Teradata 中偏斜因子的错误数字是多少?
What is bad numbers for skew factor in Teradata?
我这样确定偏斜因子:
SELECT
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
FROM
DBC.TABLESIZE
WHERE DATABASENAME= <DATABASENAME>
AND
TABLENAME =<TABLENAME>
GROUP BY 1;
对于一些大小为 ~600 Gb 的 table,偏斜系数为 30%。对于大小为 10 Gb 的 table,98% 相当高。上面的数字到底有多糟糕?官方有文章说要重新分配10%以上吗?我需要它来证实对市场开发人员的要求。我只找到 this
没有任何神奇的数字,但是 table 有 98% 的偏差意味着几乎所有数据都位于单个 AMP 中,这意味着 (1) 您正在失去 AMP 的性能优势并行数据库 (2) 您正在系统上创建不平衡的负载。
30 的偏斜因子意味着最大 AMP 上的数据比平均值多 40%。这可能仍然可以接受(当然 这取决于 ),与您的 DBSa 讨论他们通常认为太大的东西。
另一方面,98 意味着 max-AMP 上有 40 到 50 倍 的数据,这真是太多了.
这比较了两种计算偏差的方法:
SELECT
t.DatabaseName
,t.TableName
-- currently used diskspace in GB
,SUM(t.CurrentPerm) / 1024**3 (DEC(9,2)) AS CurrentPermGB
-- currently needed diskspace in GB to store this table as standalone (due to Skew)
,MAX(t.CurrentPerm) / 1024**3 * (HASHAMP() + 1) (DEC(9,2)) AS SkewedPermGB
,SkewedPermGB - CurrentPermGB AS WastedPermGB
-- AMP with higehst disk usage
,MAX(t.MaxPermAMP) AS SkewedAMP
-- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
,MAX(t.CurrentPerm) / NULLIF(AVG(t.CurrentPerm),0) (DEC(5,2)) AS SkewFactor
-- skew factor, between 0 and 99. Same calculation as WinDDI/ TD Administrator
,(100 - (AVG(t.CurrentPerm) / NULLIF(MAX(t.CurrentPerm),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI
FROM
(
SELECT
DatabaseName,
TableName,
CurrentPerm,
CASE WHEN CurrentPerm = MAX(CurrentPerm) OVER (PARTITION BY DatabaseName, TableName) THEN vproc END AS MaxPermAMP
FROM dbc.TableSizeV
WHERE DatabaseName = '???' --
) AS t
GROUP BY 1,2
HAVING SkewFactor > 1.1 -- or whatever
AND SkewedPermGB > 10 -- or whatever
ORDER BY WastedPermGB DESC
;
我这样确定偏斜因子:
SELECT
TABLENAME,
SUM(CURRENTPERM) /(1024*1024) AS CURRENTPERM,
(100 - (AVG(CURRENTPERM)/MAX(CURRENTPERM)*100)) AS SKEWFACTOR
FROM
DBC.TABLESIZE
WHERE DATABASENAME= <DATABASENAME>
AND
TABLENAME =<TABLENAME>
GROUP BY 1;
对于一些大小为 ~600 Gb 的 table,偏斜系数为 30%。对于大小为 10 Gb 的 table,98% 相当高。上面的数字到底有多糟糕?官方有文章说要重新分配10%以上吗?我需要它来证实对市场开发人员的要求。我只找到 this
没有任何神奇的数字,但是 table 有 98% 的偏差意味着几乎所有数据都位于单个 AMP 中,这意味着 (1) 您正在失去 AMP 的性能优势并行数据库 (2) 您正在系统上创建不平衡的负载。
30 的偏斜因子意味着最大 AMP 上的数据比平均值多 40%。这可能仍然可以接受(当然 这取决于 ),与您的 DBSa 讨论他们通常认为太大的东西。
另一方面,98 意味着 max-AMP 上有 40 到 50 倍 的数据,这真是太多了.
这比较了两种计算偏差的方法:
SELECT
t.DatabaseName
,t.TableName
-- currently used diskspace in GB
,SUM(t.CurrentPerm) / 1024**3 (DEC(9,2)) AS CurrentPermGB
-- currently needed diskspace in GB to store this table as standalone (due to Skew)
,MAX(t.CurrentPerm) / 1024**3 * (HASHAMP() + 1) (DEC(9,2)) AS SkewedPermGB
,SkewedPermGB - CurrentPermGB AS WastedPermGB
-- AMP with higehst disk usage
,MAX(t.MaxPermAMP) AS SkewedAMP
-- skew factor, 1 = even distribution, 1.1 = max AMP needs 10% more space than the average AMP
,MAX(t.CurrentPerm) / NULLIF(AVG(t.CurrentPerm),0) (DEC(5,2)) AS SkewFactor
-- skew factor, between 0 and 99. Same calculation as WinDDI/ TD Administrator
,(100 - (AVG(t.CurrentPerm) / NULLIF(MAX(t.CurrentPerm),0) * 100)) (DEC(3,0)) AS SkewFactor_WINDDI
FROM
(
SELECT
DatabaseName,
TableName,
CurrentPerm,
CASE WHEN CurrentPerm = MAX(CurrentPerm) OVER (PARTITION BY DatabaseName, TableName) THEN vproc END AS MaxPermAMP
FROM dbc.TableSizeV
WHERE DatabaseName = '???' --
) AS t
GROUP BY 1,2
HAVING SkewFactor > 1.1 -- or whatever
AND SkewedPermGB > 10 -- or whatever
ORDER BY WastedPermGB DESC
;