"Correlated scalar subqueries must be Aggregated" 是什么意思?
What does "Correlated scalar subqueries must be Aggregated" mean?
我使用的是 Spark 2.0。
我想执行以下 SQL 查询:
val sqlText = """
select
f.ID as TID,
f.BldgID as TBldgID,
f.LeaseID as TLeaseID,
f.Period as TPeriod,
coalesce(
(select
f ChargeAmt
from
Fact_CMCharges f
where
f.BldgID = Fact_CMCharges.BldgID
limit 1),
0) as TChargeAmt1,
f.ChargeAmt as TChargeAmt2,
l.EFFDATE as TBreakDate
from
Fact_CMCharges f
join
CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE)
where
f.ActualProjected = 'Lease'
except(
select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID)
"""
val query = spark.sql(sqlText)
query.show()
似乎coalesce
中的内部语句给出了以下错误:
pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1\n+- LocalLimit 1\n
查询有什么问题?
您必须确保您的子查询按定义(而不是按数据)只有 returns 一行。否则 Spark Analyzer 在解析 SQL 语句时会报错。
因此,当催化剂仅通过查看 SQL 语句(而不查看您的数据)无法 100% 确定子查询仅 returns 一行时,此异常被抛出。
如果你确定你的子查询只给出一行,你可以使用以下之一aggregation standard functions,所以 Spark Analyzer 很高兴:
first
avg
max
min
我使用的是 Spark 2.0。
我想执行以下 SQL 查询:
val sqlText = """
select
f.ID as TID,
f.BldgID as TBldgID,
f.LeaseID as TLeaseID,
f.Period as TPeriod,
coalesce(
(select
f ChargeAmt
from
Fact_CMCharges f
where
f.BldgID = Fact_CMCharges.BldgID
limit 1),
0) as TChargeAmt1,
f.ChargeAmt as TChargeAmt2,
l.EFFDATE as TBreakDate
from
Fact_CMCharges f
join
CMRECC l on l.BLDGID = f.BldgID and l.LEASID = f.LeaseID and l.INCCAT = f.IncomeCat and date_format(l.EFFDATE,'D')<>1 and f.Period=EFFDateInt(l.EFFDATE)
where
f.ActualProjected = 'Lease'
except(
select * from TT1 t2 left semi join Fact_CMCharges f2 on t2.TID=f2.ID)
"""
val query = spark.sql(sqlText)
query.show()
似乎coalesce
中的内部语句给出了以下错误:
pyspark.sql.utils.AnalysisException: u'Correlated scalar subqueries must be Aggregated: GlobalLimit 1\n+- LocalLimit 1\n
查询有什么问题?
您必须确保您的子查询按定义(而不是按数据)只有 returns 一行。否则 Spark Analyzer 在解析 SQL 语句时会报错。
因此,当催化剂仅通过查看 SQL 语句(而不查看您的数据)无法 100% 确定子查询仅 returns 一行时,此异常被抛出。
如果你确定你的子查询只给出一行,你可以使用以下之一aggregation standard functions,所以 Spark Analyzer 很高兴:
first
avg
max
min