Teradata SQL 如何将 "by date" 传输到 "date range"?

Teradata SQL how to transfer "by date" to by "date range"?

我有 6 亿行如下 table 1。在 Teradata SQL 中如何将 "by date" 传输到 "date range"?

+-----------+-------+------------+----------+
| ProductID | Store | Trans_Date | Cost_Amt |
+-----------+-------+------------+----------+
|     20202 |  2320 | 2018-01-02 |  .23   |
|     20202 |  2320 | 2018-01-03 |  .23   |
|     20202 |  2320 | 2018-01-04 |  .23   |
|     20202 |  2320 | 2018-01-05 |  .38   |
|     20202 |  2320 | 2018-01-06 |  .38   |
|     20202 |  2320 | 2018-01-07 |  .38   |
|     20202 |  2320 | 2018-01-08 |  .23   |
|     20202 |  2320 | 2018-01-09 |  .23   |
|     20202 |  2320 | 2018-01-10 |  .23   |
+-----------+-------+------------+----------+

期望的输出:

+-----------+-------+------------+------------+----------+
| ProductID | Store | Start Date |  End Date  | Cost_Amt |
+-----------+-------+------------+------------+----------+
|     20202 |  2320 | 2018-01-02 | 2018-01-04 |  .23   |
|     20202 |  2320 | 2018-01-05 | 2018-01-07 |  .38   |
|     20202 |  2320 | 2018-01-08 | 2018-01-10 |  .23   |
+-----------+-------+------------+------------+----------+

我将示例扩展为:

CREATE TABLE bigtable(
   ProductID  INTEGER 
  ,Store      INTEGER 
  ,Trans_Date DATE 
  ,Cost_Amt   VARCHAR(10)
);
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-02','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-03','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-04','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-05','.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-06','.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-07','.38');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-08','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-09','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-10','.23');
INSERT INTO bigtable(ProductID,Store,Trans_Date,Cost_Amt) VALUES (20202,2320,'2018-01-11','.38');

并且此查询用于显示派生的 table:

select
    *
    , row_number() over(partition by ProductID,Store           order by Trans_Date) rn1
    , row_number() over(partition by ProductID,Store,Cost_Amt  order by Trans_Date) rn2
    , row_number() over(partition by ProductID,Store           order by Trans_Date)
    - row_number() over(partition by ProductID,Store,Cost_Amt  order by Trans_Date) grp
from bigtable
order by ProductID,Store,Trans_Date
;

计算我们稍后需要的 "grp" 值:

|    | ProductID | Store |     Trans_Date      | Cost_Amt | rn1 | rn2 | grp |
|----|-----------|-------|---------------------|----------|-----|-----|-----|
|  1 |     20202 |  2320 | 02.01.2018 00:00:00 | .23    |   1 |   1 |   0 |
|  2 |     20202 |  2320 | 03.01.2018 00:00:00 | .23    |   2 |   2 |   0 |
|  3 |     20202 |  2320 | 04.01.2018 00:00:00 | .23    |   3 |   3 |   0 |
|  4 |     20202 |  2320 | 05.01.2018 00:00:00 | .38    |   4 |   1 |   3 |
|  5 |     20202 |  2320 | 06.01.2018 00:00:00 | .38    |   5 |   2 |   3 |
|  6 |     20202 |  2320 | 07.01.2018 00:00:00 | .38    |   6 |   3 |   3 |
|  7 |     20202 |  2320 | 08.01.2018 00:00:00 | .23    |   7 |   4 |   3 |
|  8 |     20202 |  2320 | 09.01.2018 00:00:00 | .23    |   8 |   5 |   3 |
|  9 |     20202 |  2320 | 10.01.2018 00:00:00 | .23    |   9 |   6 |   3 |
| 10 |     20202 |  2320 | 11.01.2018 00:00:00 | .38    |  10 |   4 |   6 |

现在计算日期范围:

select
      ProductID
    , Store
    , Cost_Amt
    , grp
    , min(Trans_Date) start_date
    , max(Trans_Date) end_date
from (
    select
        *
        , row_number() over(partition by ProductID,Store           order by Trans_Date)
        - row_number() over(partition by ProductID,Store,Cost_Amt  order by Trans_Date) grp
    from bigtable
    ) d
group by
      ProductID
    , Store
    , Cost_Amt
    , grp
;

结果是:

|    | ProductID | Store | Cost_Amt | grp |  (No column name)   |  (No column name)   |
|----|-----------|-------|----------|-----|---------------------|---------------------|
|  1 |     20202 |  2320 | .23    |   0 | 02.01.2018 00:00:00 | 04.01.2018 00:00:00 |
|  2 |     20202 |  2320 | .23    |   3 | 08.01.2018 00:00:00 | 10.01.2018 00:00:00 |
|  3 |     20202 |  2320 | .38    |   3 | 05.01.2018 00:00:00 | 07.01.2018 00:00:00 |
|  4 |     20202 |  2320 | .38    |   6 | 11.01.2018 00:00:00 | 11.01.2018 00:00:00 |

另见:http://rextester.com/PJRU91378

这不是很优雅,但似乎是正确的:

select A.ProductID , A.Store , A.Trans_Date as Start_Date ,  
(select max(Trans_Date) from Table1 as C
           where A.ProductID = C.ProductID
           and   A.Store = C.Store
           and   A.Cost_Amt = C.Cost_Amt
           and   C.Trans_Date < (
             select coalesce(min(Trans_Date),'9999-01-01') 
             from Table1 as D
             where A.ProductID = D.ProductID
             and   A.Store = D.Store
             and   A.Cost_Amt <> D.Cost_Amt
             and   A.Trans_Date < D.Trans_Date
             )
           ) as End_Date  , A.Cost_Amt
from Table1 as A 
where A.Trans_Date = (select min(Trans_Date) 
             from Table1 as B
             where A.ProductID = B.ProductID
             and   A.Store = B.Store
             and   A.Cost_Amt = B.Cost_Amt
             and   B.Trans_Date > (
             select coalesce(max(Trans_Date),'0001-01-01') 
             from Table1 as E
             where A.ProductID = E.ProductID
             and   A.Store = E.Store
             and   A.Cost_Amt <> E.Cost_Amt
             and   A.Trans_Date > E.Trans_Date
             )
             )

实例在这里:http://sqlfiddle.com/#!15/7f8c4/12

如果每个 ProductID,Store,Trans_Date 有一行(没有遗漏日期),您可以应用专有的 Teradata 语法来标准化重叠日期范围:

SELECT 
   ProductID
  ,Store
  -- split the period back into seperate dates
  ,Begin(pd) AS StartDate
  ,Last(pd) AS EndDate -- must be LAST not END to match your logic
  ,Cost_Amt
FROM
 (
   SELECT NORMALIZE 
      ProductID
     ,Store
     -- NORMALIZE only works with periods, so create it on the fly
     ,PERIOD(Trans_Date,Trans_Date+1) AS pd
     ,Cost_Amt
   FROM bigtable
 ) AS dt

create table 还有一个 normalize 选项。