SQL 根据每行一个字段计算另一列中的重复项
SQL count duplicates in another column based on one field per row
我正在制作客户保留报告。我们通过电子邮件识别客户。这是我们 table:
的一些示例数据
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| Email | BrandNewCustomer | RecurringCustomer | ReactivatedCustomer | OrderCount | TotalOrders | Date_Created | Customer_Name | Customer_Address | Customer_City | Customer_State | Customer_Zip | Customer_Country | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| zyw@marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 41:50.0 | Sha | 990 | BRO | NY | 112 | US | | | | | |
| zyu@gmail.com | 1 | 0 | 0 | 1 | 1 | 57:25.0 | Zyu | 181 | Mia | FL | 330 | US | | | | | |
| ZyR@aol.com | 1 | 0 | 0 | 1 | 1 | 10:19.0 | Day | 581 | Myr | SC | 295 | US | | | | | |
| zyr@gmail.com | 1 | 0 | 0 | 1 | 1 | 25:19.0 | Nic | 173 | Was | DC | 200 | US | | | | | |
| zy@gmail.com | 1 | 0 | 0 | 1 | 1 | 19:18.0 | Kim | 675 | MIA | FL | 331 | US | | | | | |
| zyou@gmail.com | 1 | 0 | 0 | 1 | 1 | 40:29.0 | zoe | 160 | Mob | AL | 366 | US | | | | | |
| zyon@yahoo.com | 1 | 0 | 0 | 1 | 1 | 17:21.0 | Zyo | 84 | Sta | CT | 690 | US | | | | | |
| zyo@gmail.com | 1 | 0 | 0 | 2 | 2 | 02:03.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyo@gmail.com | 1 | 0 | 0 | 1 | 2 | 12:54.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyn@icloud.com | 1 | 0 | 0 | 1 | 1 | 54:56.0 | Zyn | 916 | Nor | CA | 913 | US | | | | | |
| zyl@gmail.com | 0 | 1 | 0 | 3 | 3 | 31:27.0 | Ser | 123 | Mia | FL | 331 | US | | | | | |
| zyk@marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 44:00.0 | Myr | 101 | MIA | FL | 331 | US | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
我们通过电子邮件定义我们的客户。因此,所有具有相同电子邮件的订单都被标记为属于一个客户,然后我们在此基础上进行计算。
现在我正在尝试了解电子邮件已更改的客户。因此,为此,我们将尝试按地址排列客户。
所以每一行(所以当用电子邮件分隔时),我想有另一列叫做 Orders_With_Same_Address_Different_Email。我该怎么做?
我尝试过使用 Dense Rank 做一些事情,但它似乎不起作用:
SELECT DISTINCT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
,(DENSE_RANK() over (partition by Email order by (case when email <> email then Customer_Address end) asc)
+DENSE_RANK() over ( partition by Email order by (case when email <> email then Customer_Address end) desc)
- 1) as Orders_With_Same_Name_Different_Email
--*
FROM Customers
尝试计算按地址而不是电子邮件划分的电子邮件:
select Email,
-- ...
Orders_With_Same_Name_Different_Email = iif(
(count(email) over (partition by Customer_Address) > 1,
1, 0)
from Customers;
但这是一个教训,说明您为什么不使用电子邮件作为客户的标识符。地址也是一个坏主意。使用不会改变的东西。这通常意味着制作一个内部标识符,例如自动递增的东西:
alter table #customers
add customerId int identity(1,1) primary key not null
现在 customerId = 1 将始终指代该特定客户。
您可以按 customer_address 分组并查看计数。这是基于每个客户都有一个地址的假设。
Select * from table where
customer_address IN (
Select customer_address
From table group by customer_address
having count(distinct customer_email)
>1)
如果我明白你想做什么,我会这样解决:
请注意,您不需要 CTE 中的 having 子句,但根据您的数据,它可以使它更快。 (也就是说,如果你有一个大数据集。)
WITH email2addr
(
select email, count(distinct customer_address) as addr_cnt
from customers
group by email
having count(distinct customer_address) > 1
)
SELECT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
CASE when coalese(email2addr.addr_cnt,1) > 1 then 'Y' ELSE 'N' END as has_more_than_1_email
from customers
left join email2addr on customers.email = email2addr.email
我正在制作客户保留报告。我们通过电子邮件识别客户。这是我们 table:
的一些示例数据+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| Email | BrandNewCustomer | RecurringCustomer | ReactivatedCustomer | OrderCount | TotalOrders | Date_Created | Customer_Name | Customer_Address | Customer_City | Customer_State | Customer_Zip | Customer_Country | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
| zyw@marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 41:50.0 | Sha | 990 | BRO | NY | 112 | US | | | | | |
| zyu@gmail.com | 1 | 0 | 0 | 1 | 1 | 57:25.0 | Zyu | 181 | Mia | FL | 330 | US | | | | | |
| ZyR@aol.com | 1 | 0 | 0 | 1 | 1 | 10:19.0 | Day | 581 | Myr | SC | 295 | US | | | | | |
| zyr@gmail.com | 1 | 0 | 0 | 1 | 1 | 25:19.0 | Nic | 173 | Was | DC | 200 | US | | | | | |
| zy@gmail.com | 1 | 0 | 0 | 1 | 1 | 19:18.0 | Kim | 675 | MIA | FL | 331 | US | | | | | |
| zyou@gmail.com | 1 | 0 | 0 | 1 | 1 | 40:29.0 | zoe | 160 | Mob | AL | 366 | US | | | | | |
| zyon@yahoo.com | 1 | 0 | 0 | 1 | 1 | 17:21.0 | Zyo | 84 | Sta | CT | 690 | US | | | | | |
| zyo@gmail.com | 1 | 0 | 0 | 2 | 2 | 02:03.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyo@gmail.com | 1 | 0 | 0 | 1 | 2 | 12:54.0 | Zyo | 432 | Ell | GA | 302 | US | | | | | |
| zyn@icloud.com | 1 | 0 | 0 | 1 | 1 | 54:56.0 | Zyn | 916 | Nor | CA | 913 | US | | | | | |
| zyl@gmail.com | 0 | 1 | 0 | 3 | 3 | 31:27.0 | Ser | 123 | Mia | FL | 331 | US | | | | | |
| zyk@marketplace.amazon.com | 1 | 0 | 0 | 1 | 1 | 44:00.0 | Myr | 101 | MIA | FL | 331 | US | | | | | |
+----------------------------+------------------+-------------------+---------------------+------------+-------------+--------------+---------------+------------------+---------------+----------------+--------------+------------------+--+--+--+--+--+
我们通过电子邮件定义我们的客户。因此,所有具有相同电子邮件的订单都被标记为属于一个客户,然后我们在此基础上进行计算。
现在我正在尝试了解电子邮件已更改的客户。因此,为此,我们将尝试按地址排列客户。
所以每一行(所以当用电子邮件分隔时),我想有另一列叫做 Orders_With_Same_Address_Different_Email。我该怎么做?
我尝试过使用 Dense Rank 做一些事情,但它似乎不起作用:
SELECT DISTINCT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
,(DENSE_RANK() over (partition by Email order by (case when email <> email then Customer_Address end) asc)
+DENSE_RANK() over ( partition by Email order by (case when email <> email then Customer_Address end) desc)
- 1) as Orders_With_Same_Name_Different_Email
--*
FROM Customers
尝试计算按地址而不是电子邮件划分的电子邮件:
select Email,
-- ...
Orders_With_Same_Name_Different_Email = iif(
(count(email) over (partition by Customer_Address) > 1,
1, 0)
from Customers;
但这是一个教训,说明您为什么不使用电子邮件作为客户的标识符。地址也是一个坏主意。使用不会改变的东西。这通常意味着制作一个内部标识符,例如自动递增的东西:
alter table #customers
add customerId int identity(1,1) primary key not null
现在 customerId = 1 将始终指代该特定客户。
您可以按 customer_address 分组并查看计数。这是基于每个客户都有一个地址的假设。
Select * from table where
customer_address IN (
Select customer_address
From table group by customer_address
having count(distinct customer_email)
>1)
如果我明白你想做什么,我会这样解决:
请注意,您不需要 CTE 中的 having 子句,但根据您的数据,它可以使它更快。 (也就是说,如果你有一个大数据集。)
WITH email2addr
(
select email, count(distinct customer_address) as addr_cnt
from customers
group by email
having count(distinct customer_address) > 1
)
SELECT
Email
,BrandNewCustomer
,RecurringCustomer
,ReactivatedCustomer
,OrderCount
,TotalOrders
,Date_Created
,Customer_Name
,Customer_Address
,Customer_City
,Customer_State
,Customer_Zip
,Customer_Country
CASE when coalese(email2addr.addr_cnt,1) > 1 then 'Y' ELSE 'N' END as has_more_than_1_email
from customers
left join email2addr on customers.email = email2addr.email