如何从交易事实 table 生成时间点快照 table?

How to generate point in time snapshot table from a transaction fact table?

我有一个交易 table 通过关闭前一个记录的结束日期并用当前记录打开一个新记录来记录客户状态(A、B、C、D)的变化系统时间和新记录的结束日期将设置为高开放日期。

FactID Cust_ID Status EffectiveDate EndDate
1 1 A 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM
2 1 B 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM
3 1 C 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM
4 1 A 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM
5 1 D 24/05/2021 8:05:09 PM 31/12/9000

我正在尝试根据上述交易 table.

在某个时间点构建快照(日终报告)
ReportDate Cust_ID EODStatus A_SDate A_EDate B_SDate B_EDate C_SDate C_EDate D_SDate D_EDate
20/05/2021 11:59:59 PM 1 A 20/05/2021 8:52:29 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
21/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
22/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
23/05/2021 11:59:59 PM 1 B 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
24/05/2021 11:59:59 PM 1 D 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000
25/05/2021 11:59:59 PM 1 D 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000

我现在在构建快照之前尝试扩展事务 table 时卡住了。任何指针将不胜感激。

WITH
    date_ranges
    AS
        (SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
           FROM all_objects
          WHERE ROWNUM <= 6),
    transactions (factid, cust_id, status, effectivedate, enddate)
    AS
        (SELECT 1, 1, 'A', TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 2, 1, 'B', TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 3, 1, 'C', TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 4, 1, 'A', TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 5, 1, 'D', TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'), TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
    dataset
    AS
        (SELECT DISTINCT reportdate,
                         cust_id,
                         status     AS eodstatus,
                         effectivedate,
                         enddate
           FROM transactions CROSS JOIN date_ranges)
  SELECT reportdate,
         cust_id,
         eodstatus,
         effectivedate,
         enddate,
         CASE
             WHEN eodstatus = 'A' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS a_sdate,
         CASE WHEN eodstatus = 'A' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS a_edate,
         CASE
             WHEN eodstatus = 'B' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS b_sdate,
         CASE WHEN eodstatus = 'B' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS b_edate,
         CASE
             WHEN eodstatus = 'C' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS c_sdate,
         CASE WHEN eodstatus = 'C' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
         END             AS c_edate,
         CASE
             WHEN eodstatus = 'D' THEN MIN (effectivedate)
             ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
         END             AS d_sdate,
         CASE WHEN eodstatus = 'D' THEN MAX (enddate) ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY') 
          END             AS d_edate
    FROM dataset t
   WHERE reportdate BETWEEN effectivedate AND enddate
GROUP BY reportdate, cust_id, eodstatus, effectivedate, enddate
ORDER BY reportdate, cust_id, eodstatus;
REPORTDATE CUST_ID EODSTATUS EFFECTIVEDATE ENDDATE A_SDATE A_EDATE B_SDATE B_EDATE C_SDATE C_EDATE D_SDATE D_EDATE
20/05/2021 11:59:59 PM 1 "A" 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000
21/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
22/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
23/05/2021 11:59:59 PM 1 "B" 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000
24/05/2021 11:59:59 PM 1 "D" 24/05/2021 8:05:09 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 24/05/2021 8:05:09 PM 31/12/9000
25/05/2021 11:59:59 PM 1 "D" 24/05/2021 8:05:09 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 24/05/2021 8:05:09 PM 31/12/9000

SQL小提琴 here

PS: 我看了另一个 thread in SO ,它的标题几乎相同,但帮助不大。

更新 1:

我现在能够获取所有报告日期的每日状态,但开始和结束日期的计算以及将值转移到后续行仍然没有发生(因为我还没有弄清楚)。

更新二: 计算的开始日期和结束日期不得晚于报告日期。请参阅展示当前问题的 SQL 输出

我已经有一段时间没有使用 Oracle 了,但您正在寻找两个组件:

  1. 当前快照
  2. 一个固定的历史快照

这会为给定的硬编码日期生成快照。我没有 Oracle 来检查变量是如何工作的,所以你必须自己做日期变量部分。

注:

  • 我假设 Cust_ID 一次只能有一个状态
  • 真实世界的数据比这更复杂,并且总是存在边缘情况
  • 如果客户没有当前状态,则不会有行
  • 刚刚注意到您有重叠的日期。这是一个问题,因为客户同时处于两种状态

您可以为所有日期加入日历 table 到 运行,但这可能会非常耗费性能,并且您通常只想每天生成以添加到现有 table.

下面是从fiddle

复制的代码

设置代码

CREATE TABLE t
    (FactID int, Cust_ID int, Status varchar2(1), EffectiveDate DATE, EndDate DATE)
;

INSERT ALL 
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (1, 1, 'A', TIMESTAMP'2021-05-20 08:52:29.000', TIMESTAMP'2021-05-21 03:08:22.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (2, 1, 'B', TIMESTAMP'2021-05-21 03:08:22.000', TIMESTAMP'2021-05-24 02:47:28.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (3, 1, 'C', TIMESTAMP'2021-05-24 02:47:28.000', TIMESTAMP'2021-05-24 04:15:45.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (4, 1, 'A', TIMESTAMP'2021-05-24 04:15:45.000', TIMESTAMP'2021-05-24 08:05:09.000')
    INTO t (FactID, Cust_ID, Status, EffectiveDate, EndDate)
         VALUES (5, 1, 'D', TIMESTAMP'2021-05-24 08:05:09.000', TIMESTAMP'9000-12-31 00:00:00.000')         

SELECT * FROM dual
;

查询

SELECT
T.Cust_ID, DATE '2021-05-25' ReportDate, T.Status, T.EffectiveDate,T.EndDate,
H.A_SDATE, H.A_EDATE, H.B_SDATE, H.B_EDATE, H.C_SDATE, H.C_EDATE
FROM
(
    -- Todays snapshot
    SELECT Cust_ID,Status, EffectiveDate,EndDate
    FROM t 
    WHERE DATE '2021-05-25' BETWEEN EffectiveDate AND EndDate 
) T
LEFT OUTER JOIN
(
-- Static capture of all states
    SELECT Cust_ID, 
    MIN(CASE WHEN Status = 'A' THEN EffectiveDate ELSE NULL END) A_SDATE, 
    MAX(CASE WHEN Status = 'A' THEN LEAST(DATE '2021-07-10',EndDate) ELSE NULL END) A_EDATE,
    MIN(CASE WHEN Status = 'B' THEN EffectiveDate ELSE NULL END) B_SDATE, 
    MAX(CASE WHEN Status = 'B' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) B_EDATE,
    MIN(CASE WHEN Status = 'C' THEN EffectiveDate ELSE NULL END) C_SDATE, 
    MAX(CASE WHEN Status = 'C' THEN LEAST(DATE '2021-05-25',EndDate) ELSE NULL END) C_EDATE

    FROM t 
    -- Exclude state changes after the process date
    WHERE EffectiveDate < DATE '2021-05-25'
    GROUP BY Cust_ID
) H
ON T.Cust_ID = H.Cust_ID

首先衷心感谢所有试图帮助我的人。我以某种方式设法用一些复杂的逻辑来完成这个几乎不可能完成的任务(但它确实有效)。我尝试提供内联注释来解释推导。特别要提到@Wernfried Domscheit,他编写了 PIVOT 逻辑并删除了答案,这在很大程度上帮助了我。

WITH
    date_ranges
-- Generate dates 
    AS
        (SELECT ROWNUM, TO_DATE ('21-05-2021', 'dd-mm-yyyy') + ROWNUM - 1.00001 reportdate
           FROM all_objects
          WHERE ROWNUM <= 6),
-- Mock up source records
    transactions (factid, cust_id,status,effectivedate,enddate)
    AS
        (SELECT 1,1,'A',
                TO_DATE ('20/05/2021 8:52:29 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 2,1,'B',
                TO_DATE ('21/05/2021 3:08:22 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 3,1,'C',
                TO_DATE ('24/05/2021 2:47:28 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 4,1,'A',
                TO_DATE ('24/05/2021 4:15:45 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM') FROM DUAL
         UNION ALL
         SELECT 5,1,'D',
                TO_DATE ('24/05/2021 8:05:09 PM', 'DD/MM/YYYY HH12:MI:SS AM'),
                TO_DATE ('31/12/9000', 'DD/MM/YYYY') FROM DUAL),
    dataset
-- Apply cross join to get report date into transactions
-- Could've been much better; time crunched
    AS
        (SELECT DISTINCT reportdate,cust_id,status     AS eodstatus,effectivedate,enddate
           FROM transactions CROSS JOIN date_ranges),
    dataset1
-- Ignore start and end dates if they are older than the reporting date
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  CASE
                      WHEN reportdate > effectivedate THEN effectivedate
                      ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
                  END    AS effectivedate,
                  CASE
                      WHEN reportdate > enddate THEN enddate
                      ELSE TO_DATE ('31/12/9000', 'DD/MM/YYYY')
                  END    AS enddate
             FROM dataset
            WHERE reportdate > effectivedate),
    dataset2
-- Grab the min of start and max of end for all reporting days
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  eodstatus               AS status,
                  MIN (effectivedate)     effectivedate,
                  MAX (enddate)           enddate
             FROM dataset1
         GROUP BY reportdate, cust_id, eodstatus),
    dataset_new
-- Apply PIVOT to capture the start and end date per known statues and replacing NULLs with high open end dates
    AS
        (  SELECT reportdate,
                  cust_id,
                  eodstatus,
                  COALESCE ('A','B','C','D')                           AS status,
                  NVL (a_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    a_sdate,
                  NVL (a_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    a_edate,
                  NVL (b_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    b_sdate,
                  NVL (b_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    b_edate,
                  NVL (c_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    c_sdate,
                  NVL (c_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    c_edate,
                  NVL (d_sdate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    d_sdate,
                  NVL (d_edate, TO_DATE ('31/12/9000', 'DD/MM/YYYY'))    d_edate
             FROM dataset2
                  PIVOT (MIN (effectivedate) AS "SDATE", MAX (enddate) AS "EDATE"
                        FOR status
                        IN ('A' AS "A", 'B' AS "B", 'C' AS "C", 'D' AS "D"))
         ORDER BY reportdate),
    date_manipulations
-- Merging multiple entries into one record a day
    AS
        (  SELECT reportdate,
                  cust_id,
                  MIN (a_sdate)     a_sdate,
                  MIN (a_edate)     a_edate,
                  MIN (b_sdate)     b_sdate,
                  MIN (b_edate)     b_edate,
                  MIN (c_sdate)     c_sdate,
                  MIN (c_edate)     c_edate,
                  MIN (d_sdate)     d_sdate,
                  MIN (d_edate)     d_edate
             FROM dataset_new
         GROUP BY reportdate, cust_id
         ORDER BY 1)
-- JOIN with transaction to report the original status 
SELECT a.*, b.status
  FROM date_manipulations a JOIN transactions b ON reportdate BETWEEN effectivedate AND enddate;
REPORTDATE CUST_ID A_SDATE A_EDATE B_SDATE B_EDATE C_SDATE C_EDATE D_SDATE D_EDATE STATUS
20/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "A"
21/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
22/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
23/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 21/05/2021 3:08:22 PM 21/05/2021 3:08:22 PM 31/12/9000 31/12/9000 31/12/9000 31/12/9000 31/12/9000 "B"
24/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000 "D"
25/05/2021 11:59:59 PM 1 20/05/2021 8:52:29 PM 24/05/2021 8:05:09 PM 21/05/2021 3:08:22 PM 24/05/2021 2:47:28 PM 24/05/2021 2:47:28 PM 24/05/2021 4:15:45 PM 24/05/2021 8:05:09 PM 31/12/9000 "D"