spark-sql error column is not present in the group by, nor is an aggregate function cannot solve with first_value, collected_list
spark-sql error column is neither present in the group by, nor is it an aggregate function can't solve with first_value, collected_list
我遇到了一个 spark.sql 错误,我无法通过 Whosebug 中的答案解决,关键是我尝试了“first_value, collected_list”,但他们没有解决错误,如果我使用此列逻辑对数据进行分组,则将为错误。请你帮助我好吗?这是代码和错误。
%spark03.sql
select
date_key,
CASE WHEN (gprs_usage>0 and sms_count_on_net=0 and sms_no_off_net=0) then count(distinct(numbers)) end,
sum((gprs_usage)/(1048576)) as data_mb,
sum(sms_count_on_net + sms_no_off_net) as sms_total_cnt,
count(distinct(numbers)) as uniq_total_number
from daily_total_revenue
where date_key >= 20220101 and date_key <= 20220120
group by date_key
错误:org.apache.spark.sql.AnalysisException:表达式 'daily_total_revenue.gprs_usage
' 既不存在于分组依据中,也不是聚合函数。如果您不关心获得的值,请添加到分组依据或包装在 first()(或 first_value)中。;;
正在尝试聚合 CASE
表达式:
SELECT
date_key,
COUNT(DISTINCT CASE WHEN gprs_usage > 0 AND sms_count_on_net = 0 AND
sms_no_off_net = 0
THEN numbers END),
SUM(gprs_usage / 1048576) AS data_mb,
SUM(sms_count_on_net + sms_no_off_net) AS sms_total_cnt,
COUNT(DISTINCT numbers) AS uniq_total_number
FROM daily_total_revenue
WHERE date_key BETWEEN 20220101 AND 20220120
GROUP BY date_key;
请注意 DISTINCT
不是 SQL 中的函数。
我遇到了一个 spark.sql 错误,我无法通过 Whosebug 中的答案解决,关键是我尝试了“first_value, collected_list”,但他们没有解决错误,如果我使用此列逻辑对数据进行分组,则将为错误。请你帮助我好吗?这是代码和错误。
%spark03.sql
select
date_key,
CASE WHEN (gprs_usage>0 and sms_count_on_net=0 and sms_no_off_net=0) then count(distinct(numbers)) end,
sum((gprs_usage)/(1048576)) as data_mb,
sum(sms_count_on_net + sms_no_off_net) as sms_total_cnt,
count(distinct(numbers)) as uniq_total_number
from daily_total_revenue
where date_key >= 20220101 and date_key <= 20220120
group by date_key
错误:org.apache.spark.sql.AnalysisException:表达式 'daily_total_revenue.gprs_usage
' 既不存在于分组依据中,也不是聚合函数。如果您不关心获得的值,请添加到分组依据或包装在 first()(或 first_value)中。;;
正在尝试聚合 CASE
表达式:
SELECT
date_key,
COUNT(DISTINCT CASE WHEN gprs_usage > 0 AND sms_count_on_net = 0 AND
sms_no_off_net = 0
THEN numbers END),
SUM(gprs_usage / 1048576) AS data_mb,
SUM(sms_count_on_net + sms_no_off_net) AS sms_total_cnt,
COUNT(DISTINCT numbers) AS uniq_total_number
FROM daily_total_revenue
WHERE date_key BETWEEN 20220101 AND 20220120
GROUP BY date_key;
请注意 DISTINCT
不是 SQL 中的函数。