"Correlated scalar subqueries must be aggregated" 尽管聚合了相关的标量子查询还是出错? - 火花 SQL

"Correlated scalar subqueries must be aggregated" error despite having aggregated correlated scalar subqueries? - Spark SQL

我的查询类似于:

SELECT (SELECT first(b.id) FROM table_b b WHERE b.person = a.student LIMIT 1) student_id,
       (SELECT first(b.id) FROM table_b b WHERE b.person = a.teacher LIMIT 1) teacher_id,
       'additional_field' AS additional_field
FROM table_a  a

这会产生以下错误:

The SQL expression for node [ SQLNode7 ] is invalid. Reason: [ Correlated scalar subqueries must be aggregated: GlobalLimit 1

请注意,此查询在 Redshift 中运行良好(没有 first()

最初,我没有添加 first() 聚合,但在出现此错误后我添加了。但是,即使添加了它,这个错误仍然存​​在。

我尝试过的其他一些事情:

我没有正确聚合此查询吗?还有什么我可以尝试根据此错误“聚合”我的查询吗?


最小可重现示例:

输入表:

table_a 
student teacher
A       Z
B       Z
C       Z
    
    
table_b 
id  person
1   A
2   B
3   C
4   Z  

输出:

table_c 
student_id  teacher_id
1           4
2           4
3           4  

它不理解 LIMIT 1 的用法:删除它并将您的子查询包装在聚合函数中 - MIN() 看起来是个不错的选择 - 添加强制性 GROUP BY

SELECT
 (SELECT MIN(b.id) FROM table_b b WHERE b.person = a.student) student_id,
 (SELECT MIN(b.id) FROM table_b b WHERE b.person = a.teacher) teacher_id,
  'additional_field' AS additional_field
FROM table_a a
GROUP BY 3