SQL -- 我如何写一个 select 语句来查找连续5天下订单的客户
SQL -- How do I write a select statement that finds customers who placed orders on 5 consecutive days
orderId、orderDate、customerId 是此处的相关字段。
一个客户可能有 1 个以上的连续 5 天。
我希望输出看起来像这样
customerID startDate endDate numDays
1 2020/01/01 2020/01/05 5
1 2020/10/1 2020/10/10 10
101 2020/04/10 2020/04/15 6
到目前为止,这是我拥有的:
;
with t1 as (
select distinct o.idcustomer,orderdate, dateadd(dd,1,orderdate) nextOrderDate, 1 as tday, orderstatus
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
), t2 as (
select * from t1
union all
select o2.idcustomer, o2.orderdate, dateadd(dd,1,o2.orderdate), o.tday+1, o2.orderstatus
from t1 o2
join t2 o on o2.idcustomer=o.idcustomer and o2.orderdate=o.nextOrderDate and o2.orderstatus in (3,4)
)
--select idcustomer, max(tday) DaysInARow, min(orderDate) StartDate, max(orderdate) endDate
select idcustomer, dateadd(dd,-5,min(orderdate)) firstOrderDate, max(orderdate) lastOrderDate
from t2
where tday>=5
group by idcustomer, tday
order by idcustomer
这是一个 gaps-and-islands 问题,您希望将客户有订单的连续几天组合在一起。
如果客户每天最多有一个订单,您可以根据递增序列使用日期算法来构建组。假设您是 运行 SQL 服务器,如您当前查询的语法所示:
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
如果您只想显示 5 天以上的连胜,只需添加一个 having
子句:
having count(*) >= 5
而且,如果您只想让每位顾客获得最大的连续上垒次数(最少长度为 5):
select *
from (
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt,
rank() over(partition by customer_id order by count(*) desc) rn2
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
having count(*) >= 5
) t
where rn2 = 1
如果有重复的(customerid, orderdate)
,那么我们用dense_rank()
代替row_number()
,用count(distinct orderdate)
代替count(*)
。
我会使用以下方法并从如下查询开始:
WITH truncatedQueries as (
select distinct idcustomer, TRUNC(orderDate) as orderDate from orders
),
rawData as (
select a.idcustomer,a.orderDate as order1, b.orderDate as order2, b.orderDate-a.orderDate from truncatedQueries a inner join
truncatedQueries b on
a.idcustomer = b.idcustomer AND a.orderDate < b.orderDate AND
b.orderDate-a.orderDate =1
),
intervals as (
select idCustomer as CustomerId, min(order1) as StartDate, max(order2) as EndDate from rawData
group by idCustomer order by idCustomer
)
select CustomerId,StartDate, EndDate, endDate - StartDate as numDays from intervals where endDate - StartDate >=5;
算法如下:
- 我从日期中截断时间,然后执行
distinct
查询。通过这样做,我在每个日期每个客户的每个订单中获得 1 个条目。
- 然后我对这些结果进行 self-join,并寻找连续的几天。我通过仅保留(同一客户的)2 个订单之间相差 1 天的行来找到连续的天数。
- 然后我得到如下结果:
ID ORDERDATE ORDERDATE DIFFERENCE
1 22-APR-11 23-APR-11 1
1 23-APR-11 24-APR-11 1
2 22-APR-11 23-APR-11 1
然后我们有了基本数据,但我们还缺少两件事:
A) 间隔的开始
B) 间隔结束
C) 它们之间的差异(天数)
- 为此,我使用了
min
和 max
函数。如果我们从第一个 orderDate 列获得 min
,从第二个 orderDate
列获得 max
,我们就会得到结果。最后一部分是做减法并在必要时检查 >=5
。
谢谢大家!
这是满足我需要的代码:
;
with t1 as (
select distinct o.idcustomer,orderdate
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
)
select idCustomer, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select idcustomer, orderdate,
row_number() over(partition by idcustomer order by orderdate) rn
from t1
) t
group by idcustomer, dateadd(day, -rn, orderdate)
having count(*) >= 5
order by idcustomer, cnt
CTE 为我提供了唯一的 idCustomer 和 orderdate(消除了同一天的多个订单),从那里,我使用了@GMB (https://whosebug.com/users/10676716/gmb) 的第一个示例来创建输出。
使用 row_number 计算日期分组的答案非常酷。
orderId、orderDate、customerId 是此处的相关字段。
一个客户可能有 1 个以上的连续 5 天。
我希望输出看起来像这样
customerID startDate endDate numDays 1 2020/01/01 2020/01/05 5 1 2020/10/1 2020/10/10 10 101 2020/04/10 2020/04/15 6
到目前为止,这是我拥有的:
;
with t1 as (
select distinct o.idcustomer,orderdate, dateadd(dd,1,orderdate) nextOrderDate, 1 as tday, orderstatus
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
), t2 as (
select * from t1
union all
select o2.idcustomer, o2.orderdate, dateadd(dd,1,o2.orderdate), o.tday+1, o2.orderstatus
from t1 o2
join t2 o on o2.idcustomer=o.idcustomer and o2.orderdate=o.nextOrderDate and o2.orderstatus in (3,4)
)
--select idcustomer, max(tday) DaysInARow, min(orderDate) StartDate, max(orderdate) endDate
select idcustomer, dateadd(dd,-5,min(orderdate)) firstOrderDate, max(orderdate) lastOrderDate
from t2
where tday>=5
group by idcustomer, tday
order by idcustomer
这是一个 gaps-and-islands 问题,您希望将客户有订单的连续几天组合在一起。
如果客户每天最多有一个订单,您可以根据递增序列使用日期算法来构建组。假设您是 运行 SQL 服务器,如您当前查询的语法所示:
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
如果您只想显示 5 天以上的连胜,只需添加一个 having
子句:
having count(*) >= 5
而且,如果您只想让每位顾客获得最大的连续上垒次数(最少长度为 5):
select *
from (
select customer_id, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt,
rank() over(partition by customer_id order by count(*) desc) rn2
from (
select c.customerid, o.orderdate,
row_number() over(partition by customerid order by o.orderdate) rn
from orders o
inner join customers c on c.idcustomer = o.idcustomer
where o.orderstatus in (3, 4) and c.customertype = 0
) t
group by c.customer_id, dateadd(day, -rn, o.orderdate)
having count(*) >= 5
) t
where rn2 = 1
如果有重复的(customerid, orderdate)
,那么我们用dense_rank()
代替row_number()
,用count(distinct orderdate)
代替count(*)
。
我会使用以下方法并从如下查询开始:
WITH truncatedQueries as (
select distinct idcustomer, TRUNC(orderDate) as orderDate from orders
),
rawData as (
select a.idcustomer,a.orderDate as order1, b.orderDate as order2, b.orderDate-a.orderDate from truncatedQueries a inner join
truncatedQueries b on
a.idcustomer = b.idcustomer AND a.orderDate < b.orderDate AND
b.orderDate-a.orderDate =1
),
intervals as (
select idCustomer as CustomerId, min(order1) as StartDate, max(order2) as EndDate from rawData
group by idCustomer order by idCustomer
)
select CustomerId,StartDate, EndDate, endDate - StartDate as numDays from intervals where endDate - StartDate >=5;
算法如下:
- 我从日期中截断时间,然后执行
distinct
查询。通过这样做,我在每个日期每个客户的每个订单中获得 1 个条目。 - 然后我对这些结果进行 self-join,并寻找连续的几天。我通过仅保留(同一客户的)2 个订单之间相差 1 天的行来找到连续的天数。
- 然后我得到如下结果:
ID ORDERDATE ORDERDATE DIFFERENCE 1 22-APR-11 23-APR-11 1 1 23-APR-11 24-APR-11 1 2 22-APR-11 23-APR-11 1
然后我们有了基本数据,但我们还缺少两件事:
A) 间隔的开始 B) 间隔结束 C) 它们之间的差异(天数)
- 为此,我使用了
min
和max
函数。如果我们从第一个 orderDate 列获得min
,从第二个orderDate
列获得max
,我们就会得到结果。最后一部分是做减法并在必要时检查>=5
。
谢谢大家!
这是满足我需要的代码:
;
with t1 as (
select distinct o.idcustomer,orderdate
from orders o
join customers c on c.idcustomer=o.idcustomer
where orderstatus in (3,4) and c.customertype=0
)
select idCustomer, min(orderdate) startdate, max(orderdate) enddate, count(*) cnt
from (
select idcustomer, orderdate,
row_number() over(partition by idcustomer order by orderdate) rn
from t1
) t
group by idcustomer, dateadd(day, -rn, orderdate)
having count(*) >= 5
order by idcustomer, cnt
CTE 为我提供了唯一的 idCustomer 和 orderdate(消除了同一天的多个订单),从那里,我使用了@GMB (https://whosebug.com/users/10676716/gmb) 的第一个示例来创建输出。
使用 row_number 计算日期分组的答案非常酷。