如何防止 SQL 加入重复项?
How do I prevent duplicates from a SQL join?
我有以下 tables:
customer
:
id
customer_number
company
firstname
lastname
account_manager_email
email
comments
terms
tax_id_number
lead_source
default_catalog
credit_limit
99453
C00123456
Serenity Inc.
Malcom
Reynolds
jim.smith@example.com
mal@example.com
The cap'n
1
NULL
NULL
12345
NULL
99468
C00123456
Serenity Inc.
Zoe
Washburne
jim.smith@example.com
zoe@example.com
NULL
1
NULL
NULL
NULL
NULL
99960
C00123456
Serenity Inc.
Hoban
Washburne
jim.smith@example.com
wash@example.com
NULL
1
NULL
NULL
NULL
NULL
100088
C00123456
Serenity Inc.
Inara
Serra
jim.smith@example.com
inara@example.com
NULL
1
NULL
NULL
12345
NULL
customer_address
:
id
company
street
city
state_abbreviation
postcode
telephone
firstname
lastname
created_at
133996
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Malcom
Reynolds
2017-05-08 12:45:53.000
134452
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Accounts
Payable
2017-05-09 10:19:59.000
134961
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
REF
987654321
2017-05-09 10:19:59.000
134962
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
REF
192837465
2017-05-09 10:19:59.000
133995
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Accounts
Payable
2017-05-09 10:19:59.000
133669
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
REF
123456789
2017-05-18 10:29:42.000
133667
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Accounts
Payable
2017-05-18 07:56:45.000
133666
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Accounts
Payable
2017-05-31 07:56:46.000
133626
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
Accounts
Payable
2017-06-16 12:45:08.000
133668
Serenity, Inc
123 Any St.
Anytown
AX
12345
123-456-7890
REF
PO
2017-06-16 12:45:08.000
我正在运行此查询生成一个 CSV,我可以用它来将数据导入另一个系统:
SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
'"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
'"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
'"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
'"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
'"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
'"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
'"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
'"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
'"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
'"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
'"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
'"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
'"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
'"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
'"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
'"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
'"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
'"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
'"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
'"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
'"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
'"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
FROM customer c,
customer_address a
WHERE c.customer_number = 'C00123456'
AND a.company = c.company
ORDER BY c.customer_number,
c.created_at;
但是,当我 运行 该查询时,我返回 40 行,customer
中每个条目十行。我尝试了不同的连接类型,但结果是一样的。
其中很多都是遗留数据,因此我似乎唯一能够可靠加入的是公司名称(“Serenity, Inc.”)
我实际上需要这个输出的两个版本。第一个是每个公司的一行,其中包含 customer
table 中最旧的 created_at
值的条目。第二个是所有其他记录。
注意: 这是在 SQL Server 2005 上(我知道...计划升级,但我必须先完成)
要删除重复项,您需要枚举每一行并根据排序标准分配一个值。
您可以使用 cte
轻松做到这一点 - 我相信它们在 SQL Server 2005 中可用,我当然无法检查。
with c as(
select *, Row_Number() over(partition by customer_number order by id) rn
from customer
),
ca as (
select *, Row_Number() over(partition by company order by created_at) rn
from customer_address
)
select <columns>
from c join ca on c.company=ca.company
where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'
您还尝试了哪些其他联接?
这是为每条客户记录选择最近 customer_address 记录的一种方法:
SELECT c.*, a.* FROM
customer c
LEFT JOIN
( SELECT x.* FROM customer_address x
INNER JOIN
(SELECT company, MAX(created_at) AS created_at FROM customer_address
GROUP BY company) u
ON u.company=x.company AND u.created_at=x.created_at
) a
ON a.company=c.company
WHERE c.customer_number = 'C00123456';
和所有其他 customer_address 记录因此将由:
( SELECT x.* FROM customer_address x
INNER JOIN
(SELECT company, MAX(created_at) AS created_at FROM customer_address
GROUP BY company) u
ON u.company=x.company AND NOT u.created_at=x.created_at
) a
我有以下 tables:
customer
:
id | customer_number | company | firstname | lastname | account_manager_email | comments | terms | tax_id_number | lead_source | default_catalog | credit_limit | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
99453 | C00123456 | Serenity Inc. | Malcom | Reynolds | jim.smith@example.com | mal@example.com | The cap'n | 1 | NULL | NULL | 12345 | NULL |
99468 | C00123456 | Serenity Inc. | Zoe | Washburne | jim.smith@example.com | zoe@example.com | NULL | 1 | NULL | NULL | NULL | NULL |
99960 | C00123456 | Serenity Inc. | Hoban | Washburne | jim.smith@example.com | wash@example.com | NULL | 1 | NULL | NULL | NULL | NULL |
100088 | C00123456 | Serenity Inc. | Inara | Serra | jim.smith@example.com | inara@example.com | NULL | 1 | NULL | NULL | 12345 | NULL |
customer_address
:
id | company | street | city | state_abbreviation | postcode | telephone | firstname | lastname | created_at |
---|---|---|---|---|---|---|---|---|---|
133996 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Malcom | Reynolds | 2017-05-08 12:45:53.000 |
134452 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Accounts | Payable | 2017-05-09 10:19:59.000 |
134961 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | REF | 987654321 | 2017-05-09 10:19:59.000 |
134962 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | REF | 192837465 | 2017-05-09 10:19:59.000 |
133995 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Accounts | Payable | 2017-05-09 10:19:59.000 |
133669 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | REF | 123456789 | 2017-05-18 10:29:42.000 |
133667 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Accounts | Payable | 2017-05-18 07:56:45.000 |
133666 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Accounts | Payable | 2017-05-31 07:56:46.000 |
133626 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | Accounts | Payable | 2017-06-16 12:45:08.000 |
133668 | Serenity, Inc | 123 Any St. | Anytown | AX | 12345 | 123-456-7890 | REF | PO | 2017-06-16 12:45:08.000 |
我正在运行此查询生成一个 CSV,我可以用它来将数据导入另一个系统:
SELECT '"' + CAST(c.customer_number AS VARCHAR) + '"' AS 'Customer Number',
'"' + CAST(c.company AS VARCHAR) + '"' AS 'Company Name',
'"' + CAST(a.street AS VARCHAR) + '"' AS 'Company Address Line 1',
'"' + CAST(a.city AS VARCHAR) + '"' AS 'Company City',
'"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Company State',
'"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Company Zip Code',
'"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Company Phone',
'"' + CAST(c.firstname AS VARCHAR) + '"' AS 'Contact First Name',
'"' + CAST(c.lastname AS VARCHAR) + '"' AS 'Contact Last Name',
'"' + CAST(c.account_manager_email AS VARCHAR) + '"' AS 'Account Manager Email',
'"' + CAST(a.company AS VARCHAR) + '"' AS 'Contact Company Name',
'"' + CAST(a.street AS VARCHAR) + '"' AS 'Contact Address Line 1',
'"' + CAST(a.city AS VARCHAR) + '"' AS 'Contact City',
'"' + CAST(a.state_abbreviation AS VARCHAR) + '"' AS 'Contact State',
'"' + CAST(a.postcode AS VARCHAR) + '"' AS 'Contact Zip Code',
'"' + CAST(a.telephone AS VARCHAR) + '"' AS 'Contact Phone',
'"' + CAST(c.email AS VARCHAR) + '"' AS 'Contact Email',
'"' + CAST(c.comments AS VARCHAR) + '"' AS 'Internal Notes',
'"' + CAST(c.terms AS VARCHAR) + '"' AS 'Terms',
'"' + CAST(c.tax_id_number AS VARCHAR) + '"' AS 'Tax ID (US)',
'"' + CAST(c.lead_source AS VARCHAR) + '"' AS 'Lead Source',
'"' + CAST(c.default_catalog AS VARCHAR) + '"' AS 'Catalog',
'"' + CAST(c.credit_limit AS VARCHAR) + '"' AS 'Credit Limit'
FROM customer c,
customer_address a
WHERE c.customer_number = 'C00123456'
AND a.company = c.company
ORDER BY c.customer_number,
c.created_at;
但是,当我 运行 该查询时,我返回 40 行,customer
中每个条目十行。我尝试了不同的连接类型,但结果是一样的。
其中很多都是遗留数据,因此我似乎唯一能够可靠加入的是公司名称(“Serenity, Inc.”)
我实际上需要这个输出的两个版本。第一个是每个公司的一行,其中包含 customer
table 中最旧的 created_at
值的条目。第二个是所有其他记录。
注意: 这是在 SQL Server 2005 上(我知道...计划升级,但我必须先完成)
要删除重复项,您需要枚举每一行并根据排序标准分配一个值。
您可以使用 cte
轻松做到这一点 - 我相信它们在 SQL Server 2005 中可用,我当然无法检查。
with c as(
select *, Row_Number() over(partition by customer_number order by id) rn
from customer
),
ca as (
select *, Row_Number() over(partition by company order by created_at) rn
from customer_address
)
select <columns>
from c join ca on c.company=ca.company
where c.rn=1 and ca.rn=1 and c.customer_number='C00123456'
您还尝试了哪些其他联接?
这是为每条客户记录选择最近 customer_address 记录的一种方法:
SELECT c.*, a.* FROM
customer c
LEFT JOIN
( SELECT x.* FROM customer_address x
INNER JOIN
(SELECT company, MAX(created_at) AS created_at FROM customer_address
GROUP BY company) u
ON u.company=x.company AND u.created_at=x.created_at
) a
ON a.company=c.company
WHERE c.customer_number = 'C00123456';
和所有其他 customer_address 记录因此将由:
( SELECT x.* FROM customer_address x
INNER JOIN
(SELECT company, MAX(created_at) AS created_at FROM customer_address
GROUP BY company) u
ON u.company=x.company AND NOT u.created_at=x.created_at
) a