使用 "UNION ALL" 和 "GROUP BY" 实现 "Intersect"
Using "UNION ALL" and "GROUP BY" to implement "Intersect"
我提供了以下查询来查找 2 个数据集中的公共记录,但我很难确定查询的正确性,因为我的数据库中有很多数据记录。
是否可以使用 UNION ALL
在 "Customers" 和 "Employees" 表之间实现 Intersect
并在结果上应用 GROUP BY
,如下所示?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
那么我们是否可以说存在于此查询结果中的任何记录也 存在 在 "Customers & Employees" 表之间的 Intersect
集合中 AND 存在于 "Customers & Employees" 表 之间的 Intersect
中的任何记录将在 这个查询的结果也是?
So is it right to say any record in result of this query is in
"Intersect" set between "Customers & Employees" "AND" any record that
exist in "Intersect" set between "Customers & Employees" is in result
of this query too?
是。
... 是的,但效率不高,因为您要过滤掉重复项 3 次而不是一次。在您的查询中,您
- 使用 DISTINCT 从员工中提取唯一记录
- 使用 DISTINCT 从客户那里提取唯一记录
- 使用 UNION ALL 合并两个查询
- 在外部查询中使用 GROUP BY 来过滤您在步骤 1、2 和 3 中检索到的记录。
使用 INTERSECT 将 return 得到相同的结果,但效率更高。要亲自查看,您可以创建下面的示例数据和 运行 两个查询:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
运行 这些查询:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
结果:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
如果不能使用 INTERSECT 或者您想要更快的查询,您可以通过几种不同的方式改进您发布的查询,例如:
选项 1: 让 GROUP BY 像这样处理所有重复数据删除:
这与您发布的内容相同,但没有区别
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
选项 2: 使用 ROW_NUMBER
这是我的偏好并且可能是最有效的
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;
我提供了以下查询来查找 2 个数据集中的公共记录,但我很难确定查询的正确性,因为我的数据库中有很多数据记录。
是否可以使用 UNION ALL
在 "Customers" 和 "Employees" 表之间实现 Intersect
并在结果上应用 GROUP BY
,如下所示?
SELECT D.Country, D.Region, D.City
FROM (SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
那么我们是否可以说存在于此查询结果中的任何记录也 存在 在 "Customers & Employees" 表之间的 Intersect
集合中 AND 存在于 "Customers & Employees" 表 之间的 Intersect
中的任何记录将在 这个查询的结果也是?
So is it right to say any record in result of this query is in "Intersect" set between "Customers & Employees" "AND" any record that exist in "Intersect" set between "Customers & Employees" is in result of this query too?
是。
... 是的,但效率不高,因为您要过滤掉重复项 3 次而不是一次。在您的查询中,您
- 使用 DISTINCT 从员工中提取唯一记录
- 使用 DISTINCT 从客户那里提取唯一记录
- 使用 UNION ALL 合并两个查询
- 在外部查询中使用 GROUP BY 来过滤您在步骤 1、2 和 3 中检索到的记录。
使用 INTERSECT 将 return 得到相同的结果,但效率更高。要亲自查看,您可以创建下面的示例数据和 运行 两个查询:
use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;
create table dbo.customers
(
customerId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
create table dbo.employees
(
employeeId int identity,
country varchar(50),
region varchar(50),
city varchar(100)
);
insert dbo.customers(country, region, city)
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');
运行 这些查询:
SELECT D.Country, D.Region, D.City
FROM
(
SELECT DISTINCT Country, Region, City
FROM Customers
UNION ALL
SELECT DISTINCT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;
结果:
Country Region City
----------- ---------- ----------
us Midwest Chicago
Country Region City
----------- ---------- ----------
us Midwest Chicago
如果不能使用 INTERSECT 或者您想要更快的查询,您可以通过几种不同的方式改进您发布的查询,例如:
选项 1: 让 GROUP BY 像这样处理所有重复数据删除:
这与您发布的内容相同,但没有区别
SELECT D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;
选项 2: 使用 ROW_NUMBER
这是我的偏好并且可能是最有效的
SELECT Country, Region, City
FROM
(
SELECT
rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)),
D.Country, D.Region, D.City
FROM
(
SELECT Country, Region, City
FROM Customers
UNION ALL
SELECT Country, Region, City
FROM Employees
) AS D
) uniquify
WHERE rn = 2;