尝试在 impala 中查找 LAST_VALUE() 时出错

Taking an Error while trying to find LAST_VALUE() in impala

我试图找到每个 id 的最后一个 blnc 值,但它抛出一个错误:

AnalysisException: select list expression not produced by aggregation output (missing from GROUP BY clause?): last_value(blnc) OVER (PARTITION BY id ORDER BY id date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) lasted.

SELECT id, number, type,
    LAST_VALUE(blnc) OVER (PARTITION BY id ORDER BY date rows between unbounded preceding and unbounded following ) AS lasted ,
    to_timestamp(MAX(date),'yyyyMMdd') as end_date,
    concat(substr(date,1,6),"01") as start_date,
    substr(date,1,6) as id_month
FROM table
GROUP BY id,number,type,concat(substr(date,1,6),"01"),substr(date,1,6)

我也将所有 LAST_VALUE() 语句放在 group by 中,但出现另一个错误。

问题是你的表达:

   LAST_VALUE(blnc) OVER (PARTITION BY id 
                          ORDER BY date
                          rows between unbounded preceding and unbounded following
                          ) AS lasted ,

在 聚合之后 运行 的范围。因此,只有在聚合后可以理解的表达式才有效。并且没有 dateblnc。您可以使用聚合函数解决此问题:

   LAST_VALUE(MAX(blnc)) OVER (PARTITION BY id 
                               ORDER BY MAX(date)
                               rows between unbounded preceding and unbounded following
                              ) AS lasted ,

虽然这回答了您的问题并修复了语法错误,但它可能没有任何用处。我认为你想要条件聚合。您没有解释您想要的逻辑或提供示例数据,但想法是:

SELECT id, number, type,
       to_timestamp(MAX(date), 'yyyyMMdd') as end_date,
       concat(substr(date,1,6),"01") as start_date,
       substr(date, 1, 6) as id_month,
       MAX(CASE WHEN seqnum = 1 THEN blnc END) as lasted
FROM (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY id, number, type, concat(substr(date, 1, 6), '01'), substr(date,1,6)
                                ORDER BY date DESC
                               ) as seqnum
      FROM table t
     ) t
GROUP BY id, number, type, concat(substr(date, 1, 6), '01'), substr(date,1,6)

注意:日期的字符串操作看起来有误。如果列存储正确,您应该使用内置的 date/time 函数。