什么是数据库中的半连接?

What is semi-join in database?

我在尝试理解半连接的概念以及它与传统连接的区别时遇到了问题。我已经尝试了一些文章,但对解释不满意,有人可以帮助我理解吗?

据我了解,半连接是左连接或右连接:

What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?

所以左(半)联接和 "conventional" 联接之间的区别在于您只检索左 table 的数据(您的联接条件匹配)。而使用完整(外部)连接(我认为这就是传统连接的意思),您可以检索条件匹配的两个 table 的数据。

简单的例子。让我们select 名学生使用左外连接取得成绩:

SELECT DISTINCT s.id
FROM  students s
      LEFT JOIN grades g ON g.student_id = s.id
WHERE g.student_id IS NOT NULL

现在与左半连接相同:

SELECT s.id
FROM  students s
WHERE EXISTS (SELECT 1 FROM grades g
              WHERE g.student_id = s.id)

后者通常更有效(取决于具体的 DBMS 和查询优化器)。

据我所知SQL支持SEMIJOIN/ANTISEMI的方言有U-SQL/ClouderaImpala.

SEMIJOIN:

Semijoins are U-SQL’s way filter a rowset based on the inclusion of its rows in another rowset. Other SQL dialects express this with the SELECT * FROM A WHERE A.key IN (SELECT B.key FROM B) pattern.

更多信息Semi Join and Anti Join Should Have Their Own Syntax in SQL

“Semi” means that we don’t really join the right hand side, we only check if a join would yield results for any given tuple.

-- IN
SELECT *
FROM Employee
WHERE DeptName IN (
  SELECT DeptName
  FROM Dept
)

-- EXISTS
SELECT *
FROM Employee
WHERE EXISTS (
  SELECT 1
  FROM Dept
  WHERE Employee.DeptName = Dept.DeptName
)

编辑:

另一种支持 SEMI/ANTISEMI 连接的方言是 KQL:

kind=leftsemi (or kind=rightsemi)

Returns all the records from the left side that have matches from the right. The result table contains columns from the left side only.

let t1 = datatable(key:long, value:string)  
[1, "a",  
2, "b",
3, "c"];
let t2 = datatable(key:long)
[1,3];
t1 | join kind=leftsemi (t2) on key

demo

输出:

key  value
1    a
3    c