我如何从 Hiveql 中的 select over 语句中仅提取最近一周?
How would I extract only the latest week from a select over statement in Hiveql?
我需要一些帮助,我创建了一个查询,该查询保留 运行 元素总计 returns 是 1 还是 0 与总计 运行 的特定度量如果度量提供 0,则返回 0,示例如下:
year_week element measure running_total
2020_40 A 1 1
2020_41 A 1 2
2020_42 A 1 3
2020_43 A 0 0
2020_44 A 1 1
2020_45 A 1 2
2020_40 B 1 1
2020_41 B 1 2
2020_42 B 1 3
2020_43 B 1 4
2020_44 B 1 5
2020_45 B 1 6
以上是使用这个查询实现的:
SELECT element,
year_week,
measure,
SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM (
SELECT *,
SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
FROM (
SELECT *,
CASE WHEN measure = 1 THEN 0 ELSE 1 END AS measure_flag
FROM database.table ) x ) y
这很好而且很有效 - 但我只想为每个元素提供最近几周的数据。所以在上面的例子中它将是:
year_week element measure running_total
2020_45 A 1 2
2020_45 B 1 6
本质上我需要保持逻辑相同但限制返回的数据集。我试过了,但是它将结果从正确的 运行 总数更改为 1 或 0。
非常感谢任何帮助!
您可以添加另一层嵌套,并过滤每个 element
和 row_number()
的最新记录。
我建议:
select element, year_week, measure, running_total
from (
select t.*,
row_number() over(partition by element, grp order by year_week) - 1 as running_total
from (
select t.*,
sum(1 - measure) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1
考虑到 measure
仅具有值 0
和 1
这一事实,我稍微简化了查询,如示例数据所示。如果不是这样,那么:
select element, year_week, measure, running_total
from (
select t.*,
sum(measure) over(partition by element, grp order by year_week) as running_total
from (
select t.*,
sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1
我需要一些帮助,我创建了一个查询,该查询保留 运行 元素总计 returns 是 1 还是 0 与总计 运行 的特定度量如果度量提供 0,则返回 0,示例如下:
year_week element measure running_total
2020_40 A 1 1
2020_41 A 1 2
2020_42 A 1 3
2020_43 A 0 0
2020_44 A 1 1
2020_45 A 1 2
2020_40 B 1 1
2020_41 B 1 2
2020_42 B 1 3
2020_43 B 1 4
2020_44 B 1 5
2020_45 B 1 6
以上是使用这个查询实现的:
SELECT element,
year_week,
measure,
SUM(measure) OVER (PARTITION BY element, flag_sum ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total
FROM (
SELECT *,
SUM(measure_flag) OVER (PARTITION BY element ORDER BY year_week ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS flag_sum
FROM (
SELECT *,
CASE WHEN measure = 1 THEN 0 ELSE 1 END AS measure_flag
FROM database.table ) x ) y
这很好而且很有效 - 但我只想为每个元素提供最近几周的数据。所以在上面的例子中它将是:
year_week element measure running_total
2020_45 A 1 2
2020_45 B 1 6
本质上我需要保持逻辑相同但限制返回的数据集。我试过了,但是它将结果从正确的 运行 总数更改为 1 或 0。
非常感谢任何帮助!
您可以添加另一层嵌套,并过滤每个 element
和 row_number()
的最新记录。
我建议:
select element, year_week, measure, running_total
from (
select t.*,
row_number() over(partition by element, grp order by year_week) - 1 as running_total
from (
select t.*,
sum(1 - measure) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1
考虑到 measure
仅具有值 0
和 1
这一事实,我稍微简化了查询,如示例数据所示。如果不是这样,那么:
select element, year_week, measure, running_total
from (
select t.*,
sum(measure) over(partition by element, grp order by year_week) as running_total
from (
select t.*,
sum(case when measure = 0 then 1 else 0 end) over(partition by element order by year_week) as grp,
row_number() over(partition by element order by year_week desc) as rn
from mytable t
) t
) t
where rn = 1