使用 "UNION ALL" 和 "GROUP BY" 实现 "Intersect"

Using "UNION ALL" and "GROUP BY" to implement "Intersect"

我提供了以下查询来查找 2 个数据集中的公共记录,但我很难确定查询的正确性,因为我的数据库中有很多数据记录。

是否可以使用 UNION ALL 在 "Customers" 和 "Employees" 表之间实现 Intersect 并在结果上应用 GROUP BY,如下所示?

SELECT D.Country, D.Region, D.City
  FROM (SELECT DISTINCT Country, Region, City 
          FROM Customers
         UNION ALL
        SELECT DISTINCT Country, Region, City
          FROM Employees) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;

那么我们是否可以说存在于此查询结果中的任何记录也 存在 在 "Customers & Employees" 表之间的 Intersect 集合中 AND 存在于 "Customers & Employees" 表 之间的 Intersect 中的任何记录将在 这个查询的结果也是?

So is it right to say any record in result of this query is in "Intersect" set between "Customers & Employees" "AND" any record that exist in "Intersect" set between "Customers & Employees" is in result of this query too?

是。

... 是的,但效率不高,因为您要过滤掉重复项 3 次而不是一次。在您的查询中,您

  1. 使用 DISTINCT 从员工中提取唯一记录
  2. 使用 DISTINCT 从客户那里提取唯一记录
  3. 使用 UNION ALL 合并两个查询
  4. 在外部查询中使用 GROUP BY 来过滤您在步骤 1、2 和 3 中检索到的记录。

使用 INTERSECT 将 return 得到相同的结果,但效率更高。要亲自查看,您可以创建下面的示例数据和 运行 两个查询:

use tempdb
go
if object_id('dbo.customers') is not null drop table dbo.customers;
if object_id('dbo.employees') is not null drop table dbo.employees;

create table dbo.customers
(
  customerId int identity,
  country    varchar(50),
  region     varchar(50),
  city       varchar(100)
);

create table dbo.employees
(
  employeeId int identity,
  country    varchar(50),
  region     varchar(50),
  city       varchar(100)
);

insert dbo.customers(country, region, city) 
values ('us', 'N/E', 'New York'), ('us', 'N/W', 'Seattle'),('us', 'Midwest', 'Chicago');
insert dbo.employees
values ('us', 'S/E', 'Miami'), ('us', 'N/W', 'Portland'),('us', 'Midwest', 'Chicago');

运行 这些查询:

SELECT D.Country, D.Region, D.City
FROM 
(
  SELECT DISTINCT Country, Region, City 
  FROM Customers
  UNION ALL
  SELECT DISTINCT Country, Region, City
  FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;

SELECT Country, Region, City
FROM dbo.customers
INTERSECT
SELECT Country, Region, City
FROM dbo.employees;

结果:

Country     Region     City
----------- ---------- ----------
us          Midwest    Chicago

Country     Region     City
----------- ---------- ----------
us          Midwest    Chicago

如果不能使用 INTERSECT 或者您想要更快的查询,您可以通过几种不同的方式改进您发布的查询,例如:

选项 1: 让 GROUP BY 像这样处理所有重复数据删除:

这与您发布的内容相同,但没有区别

SELECT D.Country, D.Region, D.City
FROM 
(
  SELECT Country, Region, City 
  FROM Customers
  UNION ALL
  SELECT Country, Region, City
  FROM Employees
) AS D
GROUP BY D.Country, D.Region, D.City
HAVING COUNT(*) = 2;

选项 2: 使用 ROW_NUMBER

这是我的偏好并且可能是最有效的

SELECT Country, Region, City
FROM 
(
  SELECT
    rn = row_number() over (partition by D.Country, D.Region, D.City order by (SELECT null)), 
    D.Country, D.Region, D.City
  FROM 
  (
    SELECT Country, Region, City 
    FROM Customers
    UNION ALL
    SELECT Country, Region, City
    FROM Employees
  ) AS D
) uniquify
WHERE rn = 2;