如何防止 SQL 加入重复项?

How do I prevent duplicates from a SQL join?

我有以下 tables:

customer:

id customer_number company firstname lastname account_manager_email email comments terms tax_id_number lead_source default_catalog credit_limit
99453 C00123456 Serenity Inc. Malcom Reynolds jim.smith@example.com mal@example.com The cap'n 1 NULL NULL 12345 NULL
99468 C00123456 Serenity Inc. Zoe Washburne jim.smith@example.com zoe@example.com NULL 1 NULL NULL NULL NULL
99960 C00123456 Serenity Inc. Hoban Washburne jim.smith@example.com wash@example.com NULL 1 NULL NULL NULL NULL
100088 C00123456 Serenity Inc. Inara Serra jim.smith@example.com inara@example.com NULL 1 NULL NULL 12345 NULL

customer_address:

id company street city state_abbreviation postcode telephone firstname lastname created_at
133996 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Malcom Reynolds 2017-05-08 12:45:53.000
134452 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
134961 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 987654321 2017-05-09 10:19:59.000
134962 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 192837465 2017-05-09 10:19:59.000
133995 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-09 10:19:59.000
133669 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF 123456789 2017-05-18 10:29:42.000
133667 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-18 07:56:45.000
133666 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-05-31 07:56:46.000
133626 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 Accounts Payable 2017-06-16 12:45:08.000
133668 Serenity, Inc 123 Any St. Anytown AX 12345 123-456-7890 REF PO 2017-06-16 12:45:08.000

我正在运行此查询生成一个 CSV,我可以用它来将数据导入另一个系统:

     SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
            '"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
            '"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
            '"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
            '"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
            '"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
            '"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
            '"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
            '"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
            '"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
            '"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
            '"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
            '"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
            '"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
            '"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
            '"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
            '"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
            '"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
       FROM customer c,
            customer_address a
      WHERE c.customer_number = 'C00123456'
        AND a.company = c.company
   ORDER BY c.customer_number,
            c.created_at;

但是,当我 运行 该查询时,我返回 40 行,customer 中每个条目十行。我尝试了不同的连接类型,但结果是一样的。

其中很多都是遗留数据,因此我似乎唯一能够可靠加入的是公司名称(“Serenity, Inc.”)

我实际上需要这个输出的两个版本。第一个是每个公司的一行,其中包含 customer table 中最旧的 created_at 值的条目。第二个是所有其他记录。

注意: 这是在 SQL Server 2005 上(我知道...计划升级,但我必须先完成)

要删除重复项,您需要枚举每一行并根据排序标准分配一个值。

您可以使用 cte 轻松做到这一点 - 我相信它们在 SQL Server 2005 中可用,我当然无法检查。

with c as(
  select *, Row_Number() over(partition by customer_number order by id) rn
  from customer
),
ca as (
  select *, Row_Number() over(partition by company order by created_at) rn
  from customer_address
)
select <columns>
from c join ca on c.company=ca.company
where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'

您还尝试了哪些其他联接?

这是为每条客户记录选择最近 customer_address 记录的一种方法:

SELECT c.*, a.* FROM
customer c
LEFT JOIN 
  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND u.created_at=x.created_at
  ) a
ON a.company=c.company
WHERE c.customer_number = 'C00123456';

和所有其他 customer_address 记录因此将由:

  ( SELECT x.* FROM customer_address x
    INNER JOIN 
    (SELECT company, MAX(created_at) AS created_at FROM customer_address 
       GROUP BY company) u
    ON u.company=x.company AND NOT u.created_at=x.created_at
  ) a