如何防止 SQL 加入重复项？

Question

我有以下 tables:

customer:

id	customer_number	company	firstname	lastname	account_manager_email	email	comments	terms	tax_id_number	lead_source	default_catalog	credit_limit
99453	C00123456	Serenity Inc.	Malcom	Reynolds	jim.smith@example.com	mal@example.com	The cap'n	1	NULL	NULL	12345	NULL
99468	C00123456	Serenity Inc.	Zoe	Washburne	jim.smith@example.com	zoe@example.com	NULL	1	NULL	NULL	NULL	NULL
99960	C00123456	Serenity Inc.	Hoban	Washburne	jim.smith@example.com	wash@example.com	NULL	1	NULL	NULL	NULL	NULL
100088	C00123456	Serenity Inc.	Inara	Serra	jim.smith@example.com	inara@example.com	NULL	1	NULL	NULL	12345	NULL

customer_address:

id	company	street	city	state_abbreviation	postcode	telephone	firstname	lastname	created_at
133996	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Malcom	Reynolds	2017-05-08 12:45:53.000
134452	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Accounts	Payable	2017-05-09 10:19:59.000
134961	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	REF	987654321	2017-05-09 10:19:59.000
134962	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	REF	192837465	2017-05-09 10:19:59.000
133995	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Accounts	Payable	2017-05-09 10:19:59.000
133669	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	REF	123456789	2017-05-18 10:29:42.000
133667	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Accounts	Payable	2017-05-18 07:56:45.000
133666	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Accounts	Payable	2017-05-31 07:56:46.000
133626	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	Accounts	Payable	2017-06-16 12:45:08.000
133668	Serenity, Inc	123 Any St.	Anytown	AX	12345	123-456-7890	REF	PO	2017-06-16 12:45:08.000

我正在运行此查询生成一个 CSV，我可以用它来将数据导入另一个系统：

     SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
            '"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
            '"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
            '"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
            '"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
            '"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
            '"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
            '"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
            '"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
            '"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
            '"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
            '"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
            '"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
       FROM customer c,
            customer_address a
      WHERE c.customer_number = 'C00123456'
        AND a.company = c.company
   ORDER BY c.customer_number,
            c.created_at;

但是，当我运行该查询时，我返回 40 行，customer 中每个条目十行。我尝试了不同的连接类型，但结果是一样的。

其中很多都是遗留数据，因此我似乎唯一能够可靠加入的是公司名称（“Serenity, Inc.”）

我实际上需要这个输出的两个版本。第一个是每个公司的一行，其中包含 customer table 中最旧的 created_at 值的条目。第二个是所有其他记录。

注意： 这是在 SQL Server 2005 上（我知道...计划升级，但我必须先完成）

Answer 1

要删除重复项，您需要枚举每一行并根据排序标准分配一个值。

您可以使用 cte 轻松做到这一点 - 我相信它们在 SQL Server 2005 中可用，我当然无法检查。

with c as(
  select *, Row_Number() over(partition by customer_number order by id) rn
  from customer
),
ca as (
  select *, Row_Number() over(partition by company order by created_at) rn
  from customer_address
)
select <columns>
from c join ca on c.company=ca.company
where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'

Answer 2

您还尝试了哪些其他联接？

这是为每条客户记录选择最近 customer_address 记录的一种方法：

SELECT c.*, a.* FROM
customer c
LEFT JOIN 
  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND u.created_at=x.created_at
  ) a
ON a.company=c.company
WHERE c.customer_number = 'C00123456';

和所有其他 customer_address 记录因此将由：

  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND NOT u.created_at=x.created_at
  ) a

如何防止 SQL 加入重复项？

How do I prevent duplicates from a SQL join?

sql-server

sql-server-2005