Window,在 HIVE 中进行分区以获得平均 7 天温度

Window, Partition by in HIVE to get average 7-day temperatures

我有一个每天有多个温度读数的数据集。我正在寻找 return 最热的 7 天平均温度。

DROP TABLE IF EXISTS oshkosh; 
CREATE EXTERNAL TABLE IF NOT EXISTS oshkosh(year STRING, month STRING, day STRING, time STRING, temp FLOAT, dewpoint FLOAT, humidity INT, sealevel FLOAT, visibility FLOAT, winddir STRING, windspeed FLOAT, gustspeed FLOAT, precip FLOAT, events STRING, condition STRING, winddirdegrees INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LOCATION '/user/maria_dev/final/Oshkosh' tblproperties ("skip.header.line.count"="1"); 

SELECT o.theDate
,AVG(o.temp) over (order by o.theDate range INTERVAL 6 DAY preceding) AS Average
FROM
(SELECT CAST(to_date(from_unixtime(UNIX_TIMESTAMP(CONCAT(year,'-',IF(LENGTH(month)=1,CONCAT(0,month),month),'-',IF(LENGTH(day)=1,CONCAT(0,day),day)),'yyyy-MM-dd'))) as timestamp) as theDate
,temp AS temp
FROM oshkosh
WHERE temp != -9999) as o

这是 return 错误:

Error while compiling statement: FAILED: ParseException line 2:38 cannot recognize input near 'range' 'INTERVAL' '6' in window_value_expression

我不确定我是否想要一个时间戳作为 o.theDate 因为看起来 INTERVAL 6 DAY 调用可能找不到新的一天,因为数据集的第一天有 28 个温度读数(和第 2 天的 44 个读数,每天都不同)。

尝试:

SELECT 
    o.theDate,
    AVG(o.temp) over (order by unix_timestamp(o.theDate) range between 604800 
    preceding and current row) AS Average 
FROM
(
SELECT 
    CAST(to_date(CONCAT(year,'-',IF(LENGTH(month)=1,CONCAT(0,month),month), 
    '-',IF(LENGTH(day)=1,CONCAT(0,day),day))) AS TIMESTAMP) as theDate,
    temp AS temp
FROM oshkosh
WHERE temp != -9999
) as o