计算 amazon redshift 数据库中值的变化
Calculate the variation of values in amazon redshift database
我正在尝试计算当前月份和前一个月之间两个值的变化。
假设我在不同的月份有一个总电话,并且想要每个月与上一个月的变化。
你有一个 table 包含供应商、月份和每个月的调用
我尝试了以下查询 nut 如果上个月没有数据,它会为同一供应商提供错误的结果
select vendor,
nvl(round(sum(calls),0),0.00) as "total_calls",
nvl((((lag(CAST(sum(calls) AS decimal) ,0) over(order by month)) -
(lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))) /
(lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))), 0) as tot_calls_variation
from table_summary
group by full_month,vendor
order by month,vendor
lag() 函数 returns 给定的行 index.but 这给出了错误的结果,因为变化是按行计算的,而不是按每个供应商计算的
想知道是否还有其他方法可以这样做?谢谢
如果没有看到您的数据和期望的结果很难说,但也许您最好使用自连接而不是 window 函数:
SELECT
month_summary.vendor,
month_summary.calls,
month_summary.calls - prev_month_summary.calls / prev_month_summary.calls) as tot_calls_variation
FROM
(SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as month_summary
INNER JOIN (SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as prev_month_summary ON
month_summary.vendor = prev_month_summary.vendor AND
month_summary.full_month - 1 = prev_month_summar.full_month
分组和 lag/lead 函数的主要问题是组之间的转换点。
假设你有这样的数据,你想按A列分组,并根据B列的顺序对C的值进行运算:
A B C
=== === ===
Y 2015 1
Y 2016 2
Z 2015 3
Z 2016 4
当您使用滞后函数时,您可能会查看错误的 A 列:
A B C Lag(A) Lag(B) Lag(C)
=== === === ====== ====== ======
Y 2015 1 null null null
Y 2016 2 Y 2015 1
Z 2015 3 Y 2016 2 <-- This is the record causing your problem.
Z 2016 4 Z 2015 3
您通常想要做的是包括您分组所依据的所有字段,并在这些字段上使用滞后函数以确保将它们从计算中排除,除非滞后函数与列的值匹配。
即,在上面的示例中,第三条记录出现问题的原因是因为 A != Lag(A)。因此,如果您将 WHERE A = Lag(A)
添加到查询中,它将过滤掉这样的记录。
我同意 Welbog 对问题的评估:供应商之间的过渡。
不过,我认为最简单的解决方案是在 LAG
函数中使用 PARTITION BY vendor
。这"closes" LAG
window 值vendor
变化时
SELECT vendor,full_month,
NVL(ROUND(SUM(calls),0),0.00) as "total_calls",
NVL((((LAG(CAST(SUM(calls) AS DECIMAL), 0) OVER(PARTITION BY vendor ORDER BY month)) -
(LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month))) /
(LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month)))
, 0) as tot_calls_variation
FROM table_summary
GROUP BY vendor,full_month
ORDER BY month,vendor
还有一件事,您没有提到 table_summary
是否在供应商没有来电的月份中包含零。如果它 而不是 那么 LAG
将产生不正确的结果。
谢谢大家的回答,我想出了一个解决方案,我创建了一个临时文件 table 并将其加入供应商和上个月的月份,如下所示:
select vendor,month,
nvl(round(sum(calls),2),0.0) as "total_calls"
into temp1
from table_summary
group by month,vendor
order by month,vendor
select tb1.month ,tb1.vendor,
((tb1.total_calls - tb2.total_calls) / nullif(tb2.total_calls,0)) as tot_calls_variation
from temp1 tb1
left join temp1 tb2 on (tb1.month -1) = (tb2.month) and tb1.vendor = tb2.vendor
order by tb1.month ;
drop table temp1;
当供应商在某些月份没有呼叫数据时,这也有效
我正在尝试计算当前月份和前一个月之间两个值的变化。 假设我在不同的月份有一个总电话,并且想要每个月与上一个月的变化。 你有一个 table 包含供应商、月份和每个月的调用 我尝试了以下查询 nut 如果上个月没有数据,它会为同一供应商提供错误的结果
select vendor,
nvl(round(sum(calls),0),0.00) as "total_calls",
nvl((((lag(CAST(sum(calls) AS decimal) ,0) over(order by month)) -
(lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))) /
(lag(CAST(sum(calls) AS DECIMAL),1) over(order by month))), 0) as tot_calls_variation
from table_summary
group by full_month,vendor
order by month,vendor
lag() 函数 returns 给定的行 index.but 这给出了错误的结果,因为变化是按行计算的,而不是按每个供应商计算的 想知道是否还有其他方法可以这样做?谢谢
如果没有看到您的数据和期望的结果很难说,但也许您最好使用自连接而不是 window 函数:
SELECT
month_summary.vendor,
month_summary.calls,
month_summary.calls - prev_month_summary.calls / prev_month_summary.calls) as tot_calls_variation
FROM
(SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as month_summary
INNER JOIN (SELECT vendor, full_month, sum(calls) as calls FROM table_summary GROUP BY vendor, full_month) as prev_month_summary ON
month_summary.vendor = prev_month_summary.vendor AND
month_summary.full_month - 1 = prev_month_summar.full_month
分组和 lag/lead 函数的主要问题是组之间的转换点。
假设你有这样的数据,你想按A列分组,并根据B列的顺序对C的值进行运算:
A B C
=== === ===
Y 2015 1
Y 2016 2
Z 2015 3
Z 2016 4
当您使用滞后函数时,您可能会查看错误的 A 列:
A B C Lag(A) Lag(B) Lag(C)
=== === === ====== ====== ======
Y 2015 1 null null null
Y 2016 2 Y 2015 1
Z 2015 3 Y 2016 2 <-- This is the record causing your problem.
Z 2016 4 Z 2015 3
您通常想要做的是包括您分组所依据的所有字段,并在这些字段上使用滞后函数以确保将它们从计算中排除,除非滞后函数与列的值匹配。
即,在上面的示例中,第三条记录出现问题的原因是因为 A != Lag(A)。因此,如果您将 WHERE A = Lag(A)
添加到查询中,它将过滤掉这样的记录。
我同意 Welbog 对问题的评估:供应商之间的过渡。
不过,我认为最简单的解决方案是在 LAG
函数中使用 PARTITION BY vendor
。这"closes" LAG
window 值vendor
变化时
SELECT vendor,full_month,
NVL(ROUND(SUM(calls),0),0.00) as "total_calls",
NVL((((LAG(CAST(SUM(calls) AS DECIMAL), 0) OVER(PARTITION BY vendor ORDER BY month)) -
(LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month))) /
(LAG(CAST(SUM(calls) AS DECIMAL), 1) OVER(PARTITION BY vendor ORDER BY month)))
, 0) as tot_calls_variation
FROM table_summary
GROUP BY vendor,full_month
ORDER BY month,vendor
还有一件事,您没有提到 table_summary
是否在供应商没有来电的月份中包含零。如果它 而不是 那么 LAG
将产生不正确的结果。
谢谢大家的回答,我想出了一个解决方案,我创建了一个临时文件 table 并将其加入供应商和上个月的月份,如下所示:
select vendor,month,
nvl(round(sum(calls),2),0.0) as "total_calls"
into temp1
from table_summary
group by month,vendor
order by month,vendor
select tb1.month ,tb1.vendor,
((tb1.total_calls - tb2.total_calls) / nullif(tb2.total_calls,0)) as tot_calls_variation
from temp1 tb1
left join temp1 tb2 on (tb1.month -1) = (tb2.month) and tb1.vendor = tb2.vendor
order by tb1.month ;
drop table temp1;
当供应商在某些月份没有呼叫数据时,这也有效