"Correlated scalar subqueries must be aggregated" 尽管聚合了相关的标量子查询还是出错？ - 火花 SQL

Question

我的查询类似于：

SELECT (SELECT first(b.id) FROM table_b b WHERE b.person = a.student LIMIT 1) student_id,
       (SELECT first(b.id) FROM table_b b WHERE b.person = a.teacher LIMIT 1) teacher_id,
       'additional_field' AS additional_field
FROM table_a  a

这会产生以下错误：

The SQL expression for node [ SQLNode7 ] is invalid. Reason: [ Correlated scalar subqueries must be aggregated: GlobalLimit 1

请注意，此查询在 Redshift 中运行良好（没有 first()）

最初，我没有添加 first() 聚合，但在出现此错误后我添加了。但是，即使添加了它，这个错误仍然存在。

我尝试过的其他一些事情：

使用 max() 而不是 first()：同样的错误
正在尝试 max(first())：错误说明 Reason: [ It is not allowed to use an aggregate function in the argument of another aggregate function. Please use the inner aggregate function in a sub-query.

我没有正确聚合此查询吗？还有什么我可以尝试根据此错误“聚合”我的查询吗？

最小可重现示例：

输入表：

table_a 
student teacher
A       Z
B       Z
C       Z
    
    
table_b 
id  person
1   A
2   B
3   C
4   Z

输出：

table_c 
student_id  teacher_id
1           4
2           4
3           4

Answer 1

它不理解 LIMIT 1 的用法：删除它并将您的子查询包装在聚合函数中 - MIN() 看起来是个不错的选择 - 添加强制性 GROUP BY

SELECT
 (SELECT MIN(b.id) FROM table_b b WHERE b.person = a.student) student_id,
 (SELECT MIN(b.id) FROM table_b b WHERE b.person = a.teacher) teacher_id,
  'additional_field' AS additional_field
FROM table_a a
GROUP BY 3

"Correlated scalar subqueries must be aggregated" 尽管聚合了相关的标量子查询还是出错？ - 火花 SQL

"Correlated scalar subqueries must be aggregated" error despite having aggregated correlated scalar subqueries? - Spark SQL

sql

aggregation

apache-spark-sql