spark-sql error column is not present in the group by, nor is an aggregate function cannot solve with first_value, collected_list

Question

我遇到了一个 spark.sql 错误，我无法通过 Whosebug 中的答案解决，关键是我尝试了“first_value, collected_list”，但他们没有解决错误，如果我使用此列逻辑对数据进行分组，则将为错误。请你帮助我好吗？这是代码和错误。

%spark03.sql
select 
    date_key,
    CASE WHEN (gprs_usage>0 and sms_count_on_net=0 and sms_no_off_net=0) then count(distinct(numbers)) end,
    sum((gprs_usage)/(1048576)) as data_mb,
    sum(sms_count_on_net + sms_no_off_net) as sms_total_cnt, 
    count(distinct(numbers)) as uniq_total_number

from daily_total_revenue
where date_key >= 20220101 and date_key <= 20220120
group by date_key

错误：org.apache.spark.sql.AnalysisException：表达式 'daily_total_revenue.gprs_usage' 既不存在于分组依据中，也不是聚合函数。如果您不关心获得的值，请添加到分组依据或包装在 first()（或 first_value）中。;;

Answer 1

正在尝试聚合 CASE 表达式：

SELECT
    date_key,
    COUNT(DISTINCT CASE WHEN gprs_usage > 0 AND sms_count_on_net = 0 AND
                             sms_no_off_net = 0
                        THEN numbers END),
    SUM(gprs_usage / 1048576) AS data_mb,
    SUM(sms_count_on_net + sms_no_off_net) AS sms_total_cnt,
    COUNT(DISTINCT numbers) AS uniq_total_number
FROM daily_total_revenue
WHERE date_key BETWEEN 20220101 AND 20220120
GROUP BY date_key;

请注意 DISTINCT 不是 SQL 中的函数。

spark-sql error column is not present in the group by, nor is an aggregate function cannot solve with first_value, collected_list

spark-sql error column is neither present in the group by, nor is it an aggregate function can't solve with first_value, collected_list

sql

apache-spark-sql