如果 tag<>0 如何在 HIVE 中计算 运行 总和,如果 tag=0 则重置为 0?
How to compute running sum if tag<>0 and reset to 0 if tag=0 in HIVE?
customer txn_date tag running_sum
A 1-Jan-17 1 1
A 2-Jan-17 1 2
A 3-Jan-17 1 3
A 4-Jan-17 1 4
A 5-Jan-17 1 5
A 6-Jan-17 1 6
A 7-Jan-17 0 0
A 8-Jan-17 1 1
A 9-Jan-17 1 2
A 10-Jan-17 1 3
A 11-Jan-17 0 0
A 12-Jan-17 0 0
A 13-Jan-17 1 1
A 14-Jan-17 1 2
A 15-Jan-17 0 0
如果 tag=0,如何获取 running_sum 并将 running_sum 重置为零?就像上面的示例一样。 TIA
您需要做的是为您的 1 和 0 的每个部分创建 "groups"。您可以通过创建一个布尔标志然后对该列进行累积求和以获得组来执行此操作。从那里,您可以根据您在 sub-query.
中创建的每个组对原始 tag
列进行累计求和
查询:
SELECT customer
, txn_date
, tag
, SUM(tag) OVER (PARTITION BY customer, flg_sum ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *
, SUM(tag_flg) OVER (PARTITION BY customer ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *
, CASE WHEN tag = 1 THEN 0 ELSE 1 END AS tag_flg
FROM database.table ) x ) y
输出:
customer txn_date tag running_sum
A 2017-01-01 1 1
A 2017-01-02 1 2
A 2017-01-03 1 3
A 2017-01-04 1 4
A 2017-01-05 1 5
A 2017-01-06 1 6
A 2017-01-07 0 0
A 2017-01-08 1 1
A 2017-01-09 1 2
A 2017-01-10 1 3
A 2017-01-11 0 0
A 2017-01-12 0 0
A 2017-01-13 1 1
A 2017-01-14 1 2
A 2017-01-15 0 0
customer txn_date tag running_sum
A 1-Jan-17 1 1
A 2-Jan-17 1 2
A 3-Jan-17 1 3
A 4-Jan-17 1 4
A 5-Jan-17 1 5
A 6-Jan-17 1 6
A 7-Jan-17 0 0
A 8-Jan-17 1 1
A 9-Jan-17 1 2
A 10-Jan-17 1 3
A 11-Jan-17 0 0
A 12-Jan-17 0 0
A 13-Jan-17 1 1
A 14-Jan-17 1 2
A 15-Jan-17 0 0
如果 tag=0,如何获取 running_sum 并将 running_sum 重置为零?就像上面的示例一样。 TIA
您需要做的是为您的 1 和 0 的每个部分创建 "groups"。您可以通过创建一个布尔标志然后对该列进行累积求和以获得组来执行此操作。从那里,您可以根据您在 sub-query.
中创建的每个组对原始tag
列进行累计求和
查询:
SELECT customer
, txn_date
, tag
, SUM(tag) OVER (PARTITION BY customer, flg_sum ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
SELECT *
, SUM(tag_flg) OVER (PARTITION BY customer ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
FROM (
SELECT *
, CASE WHEN tag = 1 THEN 0 ELSE 1 END AS tag_flg
FROM database.table ) x ) y
输出:
customer txn_date tag running_sum
A 2017-01-01 1 1
A 2017-01-02 1 2
A 2017-01-03 1 3
A 2017-01-04 1 4
A 2017-01-05 1 5
A 2017-01-06 1 6
A 2017-01-07 0 0
A 2017-01-08 1 1
A 2017-01-09 1 2
A 2017-01-10 1 3
A 2017-01-11 0 0
A 2017-01-12 0 0
A 2017-01-13 1 1
A 2017-01-14 1 2
A 2017-01-15 0 0