这个 SQL 子查询如何正确矢量化?
How is this SQL sub-query correctly vectorising?
示例数据:
CREATE TABLE Departments (
Code INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
Budget decimal NOT NULL
);
CREATE TABLE Employees (
SSN INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
LastName varchar(255) NOT NULL ,
Department INTEGER NOT NULL ,
foreign key (department) references Departments(Code)
)
INSERT INTO Departments(Code,Name,Budget) VALUES(14,'IT',65000);
INSERT INTO Departments(Code,Name,Budget) VALUES(37,'Accounting',15000);
INSERT INTO Departments(Code,Name,Budget) VALUES(59,'Human Resources',240000);
INSERT INTO Departments(Code,Name,Budget) VALUES(77,'Research',55000);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('123234877','Michael','Rogers',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('152934485','Anand','Manikutty',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('222364883','Carol','Smith',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('326587417','Joe','Stevens',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332154719','Mary-Anne','Foster',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332569843','George','O''Donnell',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('546523478','John','Doe',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('631231482','David','Smith',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('654873219','Zacary','Efron',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('745685214','Eric','Goldsmith',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657245','Elizabeth','Doe',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657246','Kumar','Swamy',14);
问题:“Select 员工人数超过两名的部门名称。”
/*With subquery*/
SELECT D.Name FROM Departments D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code
);
我的问题:这个解决方案是如何工作的?也就是说,MSSQL 如何知道 Departments
中的哪些值要从子查询中保留下来?我看不出任何方式 WHERE Department = D.Code
可以 return 以对外部查询有用的方式排序的结果。我不认为这是侥幸,我认为我只是不明白 SQL 是如何矢量化的。
这称为相关子查询。
也就是说,内部查询通过外部引用与外部查询相关。在这种情况下,即 D.Code
。因此,正在为 D
.
的每一行 计算子查询
这不是排序 的问题,事实上这个查询可以return 任何顺序的结果。但是子查询的结果 必须 大于 2
否则 WHERE
谓词失败。
SELECT D.Name FROM Departments D -- Departments has been aliased as D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code -- Here the inner query is being limited by
-- the reference to the outer D table
);
I would probably use ... > 2
rather than 2 < ...
Side point: It's better to always use an explicit table reference in subqueries, eg e.Department = D.Code
, because otherwise you could misspell a column and end up referring to an outer column instead of an inner column, and the correlation wouldn't work properly
示例数据:
CREATE TABLE Departments (
Code INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
Budget decimal NOT NULL
);
CREATE TABLE Employees (
SSN INTEGER PRIMARY KEY,
Name varchar(255) NOT NULL ,
LastName varchar(255) NOT NULL ,
Department INTEGER NOT NULL ,
foreign key (department) references Departments(Code)
)
INSERT INTO Departments(Code,Name,Budget) VALUES(14,'IT',65000);
INSERT INTO Departments(Code,Name,Budget) VALUES(37,'Accounting',15000);
INSERT INTO Departments(Code,Name,Budget) VALUES(59,'Human Resources',240000);
INSERT INTO Departments(Code,Name,Budget) VALUES(77,'Research',55000);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('123234877','Michael','Rogers',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('152934485','Anand','Manikutty',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('222364883','Carol','Smith',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('326587417','Joe','Stevens',37);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332154719','Mary-Anne','Foster',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('332569843','George','O''Donnell',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('546523478','John','Doe',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('631231482','David','Smith',77);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('654873219','Zacary','Efron',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('745685214','Eric','Goldsmith',59);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657245','Elizabeth','Doe',14);
INSERT INTO Employees(SSN,Name,LastName,Department) VALUES('845657246','Kumar','Swamy',14);
问题:“Select 员工人数超过两名的部门名称。”
/*With subquery*/
SELECT D.Name FROM Departments D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code
);
我的问题:这个解决方案是如何工作的?也就是说,MSSQL 如何知道 Departments
中的哪些值要从子查询中保留下来?我看不出任何方式 WHERE Department = D.Code
可以 return 以对外部查询有用的方式排序的结果。我不认为这是侥幸,我认为我只是不明白 SQL 是如何矢量化的。
这称为相关子查询。
也就是说,内部查询通过外部引用与外部查询相关。在这种情况下,即 D.Code
。因此,正在为 D
.
这不是排序 的问题,事实上这个查询可以return 任何顺序的结果。但是子查询的结果 必须 大于 2
否则 WHERE
谓词失败。
SELECT D.Name FROM Departments D -- Departments has been aliased as D
WHERE 2 <
(
SELECT COUNT(*)
FROM Employees
WHERE Department = D.Code -- Here the inner query is being limited by
-- the reference to the outer D table
);
I would probably use
... > 2
rather than2 < ...
Side point: It's better to always use an explicit table reference in subqueries, eg
e.Department = D.Code
, because otherwise you could misspell a column and end up referring to an outer column instead of an inner column, and the correlation wouldn't work properly