计算 amazon redshift 数据库中值的变化

Calculate the variation of values in amazon redshift database

我正在尝试计算当前月份和前一个月之间两个值的变化。 假设我在不同的月份有一个总电话,并且想要每个月与上一个月的变化。 你有一个 table 包含供应商、月份和每个月的调用 我尝试了以下查询 nut 如果上个月没有数据,它会为同一供应商提供错误的结果

  select vendor,
         nvl(round(sum(calls),0),0.00) as "total_calls",
         nvl((((lag(CAST(sum(calls) AS decimal) ,0) over(order by month)) -             
               (lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))) / 
               (lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))), 0) as tot_calls_variation
    from table_summary
group by full_month,vendor
order by month,vendor

lag() 函数 returns 给定的行 index.but 这给出了错误的结果,因为变化是按行计算的,而不是按每个供应商计算的 想知道是否还有其他方法可以这样做?谢谢

如果没有看到您的数据和期望的结果很难说,但也许您最好使用自连接而不是 window 函数:

SELECT
    month_summary.vendor,
    month_summary.calls,
    month_summary.calls - prev_month_summary.calls / prev_month_summary.calls) as tot_calls_variation
FROM
    (SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as month_summary
    INNER JOIN (SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as prev_month_summary ON
        month_summary.vendor = prev_month_summary.vendor AND
        month_summary.full_month - 1 = prev_month_summar.full_month

分组和 lag/lead 函数的主要问题是组之间的转换点。

假设你有这样的数据,你想按A列分组,并根据B列的顺序对C的值进行运算:

A     B     C
===   ===   ===
Y     2015  1
Y     2016  2
Z     2015  3
Z     2016  4

当您使用滞后函数时,您可能会查看错误的 A 列:

A     B     C    Lag(A)  Lag(B)  Lag(C)
===   ===   ===  ======  ======  ======
Y     2015  1    null    null    null
Y     2016  2    Y       2015    1
Z     2015  3    Y       2016    2  <-- This is the record causing your problem.
Z     2016  4    Z       2015    3

您通常想要做的是包括您分组所依据的所有字段,并在这些字段上使用滞后函数以确保将它们从计算中排除,除非滞后函数与列的值匹配。

即,在上面的示例中,第三条记录出现问题的原因是因为 A != Lag(A)。因此,如果您将 WHERE A = Lag(A) 添加到查询中,它将过滤掉这样的记录。

我同意 Welbog 对问题的评估:供应商之间的过渡。

不过,我认为最简单的解决方案是在 LAG 函数中使用 PARTITION BY vendor。这"closes" LAG window 值vendor 变化时

SELECT vendor,full_month,
       NVL(ROUND(SUM(calls),0),0.00) as "total_calls",
       NVL((((LAG(CAST(SUM(calls) AS DECIMAL), 0) OVER(PARTITION BY vendor ORDER BY month)) -             
             (LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month))) / 
             (LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month)))
           , 0) as tot_calls_variation
    FROM table_summary
GROUP BY vendor,full_month
ORDER BY month,vendor

还有一件事,您没有提到 table_summary 是否在供应商没有来电的月份中包含零。如果它 而不是 那么 LAG 将产生不正确的结果。

谢谢大家的回答,我想出了一个解决方案,我创建了一个临时文件 table 并将其加入供应商和上个月的月份,如下所示:

select vendor,month, 
nvl(round(sum(calls),2),0.0) as "total_calls"
into temp1
from table_summary
group by month,vendor
order by month,vendor
select tb1.month ,tb1.vendor,
((tb1.total_calls - tb2.total_calls) / nullif(tb2.total_calls,0)) as tot_calls_variation
from temp1 tb1
left join temp1 tb2 on (tb1.month -1) = (tb2.month) and tb1.vendor = tb2.vendor 
order by tb1.month ; 
drop table temp1;

当供应商在某些月份没有呼叫数据时,这也有效