如果 tag<>0 如何在 HIVE 中计算 运行 总和,如果 tag=0 则重置为 0?

How to compute running sum if tag<>0 and reset to 0 if tag=0 in HIVE?

customer    txn_date    tag running_sum
A           1-Jan-17    1   1
A           2-Jan-17    1   2
A           3-Jan-17    1   3
A           4-Jan-17    1   4
A           5-Jan-17    1   5
A           6-Jan-17    1   6
A           7-Jan-17    0   0
A           8-Jan-17    1   1
A           9-Jan-17    1   2
A           10-Jan-17   1   3
A           11-Jan-17   0   0
A           12-Jan-17   0   0
A           13-Jan-17   1   1
A           14-Jan-17   1   2
A           15-Jan-17   0   0

如果 tag=0,如何获取 running_sum 并将 running_sum 重置为零?就像上面的示例一样。 TIA

您需要做的是为您的 1 和 0 的每个部分创建 "groups"。您可以通过创建一个布尔标志然后对该列进行累积求和以获得组来执行此操作。从那里,您可以根据您在 sub-query.

中创建的每个组对原始 tag 列进行累计求和

查询:

SELECT customer
  , txn_date
  , tag
  , SUM(tag) OVER (PARTITION BY customer, flg_sum ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_sum
FROM (
  SELECT *
    , SUM(tag_flg) OVER (PARTITION BY customer ORDER BY txn_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flg_sum
  FROM (
    SELECT *
      , CASE WHEN  tag = 1 THEN 0 ELSE 1 END AS tag_flg
    FROM database.table ) x ) y

输出:

customer        txn_date        tag     running_sum
A               2017-01-01      1       1
A               2017-01-02      1       2
A               2017-01-03      1       3
A               2017-01-04      1       4
A               2017-01-05      1       5
A               2017-01-06      1       6
A               2017-01-07      0       0
A               2017-01-08      1       1
A               2017-01-09      1       2
A               2017-01-10      1       3
A               2017-01-11      0       0
A               2017-01-12      0       0
A               2017-01-13      1       1
A               2017-01-14      1       2
A               2017-01-15      0       0