查找当前月份和上个月值的总和
Find sum of Current month and Previous month values
我有一个来源 table,其中包含每个月的员工帐户详细信息,日期是字符串类型 (yyyyMMdd)。正在尝试查找每个帐户的当前月份值和上个月值的总和。
Source data:
+-----------+-------------+-----------+----------+
| date | account | division | amount |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB1 | 110 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB2 | 120 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB4 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB1 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB2 | 100 |
+-----------+-------------+-----------+----------+
| 20190131 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
运行 下面是 impala 中的查询,但这返回了当前和上个月相同的结果。
select distinct * from (
SELECT
sum(amount) over (partition BY account, a.date) AS asset_current,
sum(amount) over (partition BY account, from_unixtime(unix_timestamp(to_date(LAST_DAY(ADD_MONTHS(to_timestamp(data_as_of_date,'yyyyMMdd'),-1))),'yyyy-MM-dd'),'yyyyMMdd')) AS asset_previous,
account,
date,
FROM employee_assets a
)x ;
预期输出:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 0 |
+-----------+-------------+--------------------+----------------------+
我使用了以下查询,但如果上个月的数据不可用,它会返回 asset_previous 之前的一个月。
SELECT
x.*,
LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum
FROM (
SELECT adate, account, SUM(amount) current_month_sum
FROM employee_assets
GROUP BY adate, account
) x
ORDER BY adate DESC
例如:我们没有帐户 123 的 20181231 的输入数据,因此 asset_prev 月份的 1 月应该为 0,但查询返回 500(这是 2018 年 11 月的金额)
输入数据:
+-----------+-------------+-----------+----------+
| date | account | division | amount |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB1 | 110 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB2 | 120 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB4 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB1 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB2 | 100 |
+-----------+-------------+-----------+----------+
| 20190131 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20181130 | 123 | ABX | 500 |
+-----------+-------------+-----------+----------+
查询正在返回:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 500 |
+-----------+-------------+--------------------+----------------------+
| 20191131 | 123 | 500 | 0 |
+-----------+-------------+--------------------+----------------------+
预期输出:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 0 |
+-----------+-------------+--------------------+----------------------+
| 20191131 | 123 | 500 | 0 |
+-----------+-------------+--------------------+----------------------+
您可以在内部查询中使用聚合,并在外部查询中使用LAG()
来获取account
分区中上个月的值。 LAG()
的三参数形式允许您指定默认值。
SELECT
x.*,
LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum
FROM (
SELECT adate, account, SUM(amount) current_month_sum
FROM employee_assets
GROUP BY adate, account
) x
ORDER BY adate DESC
注意:date
不是列名的好选择,因为它可能与保留字冲突。我在查询中将该列重命名为 adate
。
我有一个来源 table,其中包含每个月的员工帐户详细信息,日期是字符串类型 (yyyyMMdd)。正在尝试查找每个帐户的当前月份值和上个月值的总和。
Source data:
+-----------+-------------+-----------+----------+
| date | account | division | amount |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB1 | 110 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB2 | 120 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB4 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB1 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB2 | 100 |
+-----------+-------------+-----------+----------+
| 20190131 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
运行 下面是 impala 中的查询,但这返回了当前和上个月相同的结果。
select distinct * from (
SELECT
sum(amount) over (partition BY account, a.date) AS asset_current,
sum(amount) over (partition BY account, from_unixtime(unix_timestamp(to_date(LAST_DAY(ADD_MONTHS(to_timestamp(data_as_of_date,'yyyyMMdd'),-1))),'yyyy-MM-dd'),'yyyyMMdd')) AS asset_previous,
account,
date,
FROM employee_assets a
)x ;
预期输出:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 0 |
+-----------+-------------+--------------------+----------------------+
我使用了以下查询,但如果上个月的数据不可用,它会返回 asset_previous 之前的一个月。
SELECT
x.*,
LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum
FROM (
SELECT adate, account, SUM(amount) current_month_sum
FROM employee_assets
GROUP BY adate, account
) x
ORDER BY adate DESC
例如:我们没有帐户 123 的 20181231 的输入数据,因此 asset_prev 月份的 1 月应该为 0,但查询返回 500(这是 2018 年 11 月的金额) 输入数据:
+-----------+-------------+-----------+----------+
| date | account | division | amount |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB1 | 110 |
+-----------+-------------+-----------+----------+
| 20190331 | 123 | AB2 | 120 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB4 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB1 | 100 |
+-----------+-------------+-----------+----------+
| 20190228 | 123 | AB2 | 100 |
+-----------+-------------+-----------+----------+
| 20190131 | 123 | AB0 | 100 |
+-----------+-------------+-----------+----------+
| 20181130 | 123 | ABX | 500 |
+-----------+-------------+-----------+----------+
查询正在返回:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 500 |
+-----------+-------------+--------------------+----------------------+
| 20191131 | 123 | 500 | 0 |
+-----------+-------------+--------------------+----------------------+
预期输出:
+-----------+-------------+--------------------+----------------------+
| date | account | current_month_sum | previous_month_sum |
+-----------+-------------+--------------------+----------------------+
| 20190331 | 123 | 330 | 300 |
+-----------+-------------+--------------------+----------------------+
| 20190228 | 123 | 300 | 100 |
+-----------+-------------+--------------------+----------------------+
| 20190131 | 123 | 100 | 0 |
+-----------+-------------+--------------------+----------------------+
| 20191131 | 123 | 500 | 0 |
+-----------+-------------+--------------------+----------------------+
您可以在内部查询中使用聚合,并在外部查询中使用LAG()
来获取account
分区中上个月的值。 LAG()
的三参数形式允许您指定默认值。
SELECT
x.*,
LAG(current_month_sum, 1, 0) OVER(PARTITION BY account ORDER BY adate) previous_month_sum
FROM (
SELECT adate, account, SUM(amount) current_month_sum
FROM employee_assets
GROUP BY adate, account
) x
ORDER BY adate DESC
注意:date
不是列名的好选择,因为它可能与保留字冲突。我在查询中将该列重命名为 adate
。