SQL 查找过去 12 个月内连续月份的最大数目

SQL Find max no of consecutive months over a period of last 12 Months

我正在尝试在 sql 中编写一个查询,我需要在其中找到最大编号。过去 12 个月中连续月份的数量,不包括 6 月和 7 月。

所以例如我有一个初始 table 如下

+---------+--------------+-----------+------------+
|      id | Payment      |  amount   |    Date    |
+---------+--------------+-----------+------------+
|       1 | CJ1          |     70000 | 11/3/2020  |
|       1 | 1B4          |  36314000 | 12/1/2020  |
|       1 | I21          | 119439000 | 1/12/2021  |
|       1 | 0QO          |   9362100 | 2/2/2021   |
|       1 | 1G0          | 140431000 | 2/23/2021  |
|       1 | 1G           |   9362100 | 3/2/2021   |
|       1 | g5d          |   9362100 | 4/6/2021   |
|       1 | rt5s         |  13182500 | 4/13/2021  |
|       1 | fgs5         |     48598 | 5/18/2021  |
|       1 | sd8          |     42155 | 5/25/2021  |
|       1 | wqe8         |  47822355 | 7/20/2021  |
|       1 | cbg8         |   4589721 | 7/27/2021  |
|       1 | jlk8         |   4589721 | 8/3/2021   |
|       1 | cxn9         |   4589721 | 10/5/2021  |
|       1 | qwe          |  45897210 | 11/9/2021  |
|       1 | mmm          |  45897210 | 12/16/2021 |
+---------+--------------+-----------+------------+

我写了下面的查询:

SELECT customer_number, year, month,
payment_month - lag(payment_month) OVER(partition by customer_number ORDER BY year, month) as previous_month_indicator,
FROM 
(
    SELECT DISTINCT Month(date) as month, Year(date) as year, CUSTOMER_NUMBER  
    FROM Table1
    WHERE Month(date) not in (6,7)
    and TO_DATE(date,'yyyy-MM-dd') >= DATE_SUB('2021-12-31', 425)
    and customer_number = 1
) As C

我得到了这个输出

+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
|               1 | 2020 |    11 | null                     |
|               1 | 2020 |    12 | 1                        |
|               1 | 2021 |     1 | -11                      |
|               1 | 2021 |     2 | 1                        |
|               1 | 2021 |     3 | 1                        |
|               1 | 2021 |     4 | 1                        |
|               1 | 2021 |     5 | 1                        |
|               1 | 2021 |     8 | 3                        |
|               1 | 2021 |    10 | 2                        |
|               1 | 2021 |    11 | 1                        |
+-----------------+------+-------+--------------------------+

我想要的是这样的景色 预期输出


+-----------------+------+-------+--------------------------+
| customer_number | year | month | previous_month_indicator |
+-----------------+------+-------+--------------------------+
|               1 | 2020 |    11 |                        1 |
|               1 | 2020 |    12 |                        1 |
|               1 | 2021 |     1 |                        1 |
|               1 | 2021 |     2 |                        1 |
|               1 | 2021 |     3 |                        1 |
|               1 | 2021 |     4 |                        1 |
|               1 | 2021 |     5 |                        1 |
|               1 | 2021 |     8 |                        1 |
|               1 | 2021 |     9 |                        0 |
|               1 | 2021 |    10 |                        1 |
|               1 | 2021 |    11 |                        1 |
+-----------------+------+-------+--------------------------+

由于June/July无所谓,5月之后,8月应该算连续月,9月因为没有记录所以显示为0,断了连续月链

我最终想要的输出是获得进行交易的连续月份的最大数量,在上述情况下,从 2020 年 11 月到 2021 年 8 月为 8

最终期望输出:

+-----------------+-------------------------+
| customer_number | Max_consecutive_months |
+-----------------+-------------------------+
|               1 |                       8 |
+-----------------+-------------------------+

CTE 可以更容易地分解它。在下面的代码中,payment_streak CTE 是关键位; start_of_streak 字段首先标记算作连胜开始的行,然后取之前所有行的最大值(以找到 this 连胜的开始)。

最后一个 SELECT 只是比较这两个日期,计算它们之间有多少个月(不包括 June/July),然后找到每个客户的最佳连胜。

WITH payments_in_context AS (
  SELECT customer_number,
    date,
    lag(date) OVER (PARTITION BY customer_number ORDER BY date) AS prev_date
  FROM Table1
  WHERE EXTRACT(month FROM date) NOT IN (6,7)
),
payment_streak AS (
  SELECT 
    customer_number,
    date,
    max(
    CASE WHEN (prev_date IS NULL)
           OR (EXTRACT(month FROM date) <> 8
                 AND (date - prev_date >= 62 
                    OR MOD(12 + EXTRACT(month FROM date) - EXTRACT(month FROM prev_date),12)) > 1))
           OR (EXTRACT(month FROM date) = 8
                 AND (date - prev_date >= 123
                    OR EXTRACT(month FROM prev_date) NOT IN (5,8)))
         THEN date END
    ) OVER (PARTITION BY customer_number ORDER BY date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    as start_of_streak
    FROM payments_in_context
)
SELECT customer_number,
  max( 1 +
    10*(EXTRACT(year FROM date) - EXTRACT(year FROM start_of_streak))
      + (EXTRACT(month FROM date) - EXTRACT(month FROM start_of_streak))
      + CASE WHEN (EXTRACT(month FROM date) > 7 AND EXTRACT(month FROM start_of_streak) < 6)
             THEN -2
             WHEN (EXTRACT(month FROM date) < 6 AND EXTRACT(month FROM start_of_streak) > 7)
             THEN 2
             ELSE 0 END
     ) AS max_consecutive_months
FROM payment_streak
GROUP BY 1;

您可以使用递归cte为每个客户id生成十二个月时间跨度内的所有日期,然后找出该时间间隔内不包括六月和七月的连续日期的最大数目:

with recursive cte(id, m, c) as (
   select cust_id, min(date), 1 from payments group by cust_id
   union all
   select c.id, c.m + interval 1 month, c.c+1 from cte c where c.c <= 12
),
dts(id, m, f) as (
   select c.id, c.m, c.c = 1 or exists 
       (select 1 from payments p where p.cust_id = c.id and extract(month from p.date) = extract(month from (c.m - interval 1 month)) and extract(year from p.date) = extract(year from (c.m - interval 1 month))) 
   from cte c where extract(month from c.m) not in (6,7)
),
result(id, f, c) as (
  select d.id, d.f, (select sum(d.id = d1.id and d1.m < d.m and d1.f = 0)+1 from dts d1) 
  from dts d where d.f != 0
)
select r1.id, max(r1.s)-1 from (select r.id, r.c, sum(r.f) s from result r group by r.id, r.c) r1 group by r1.id