一列的历史聚合,直到另一列中每一行的指定时间

historical aggregation of a column up until a specified time in each row in another column

我在 Amazon RedShift 中有两个 tables login_attemptscheckouts。一个用户可以有多次(不)成功的登录尝试和多次(不)成功的结账,如本例所示:

login_attempts

login_id | user_id  |       login           |   success
-------------------------------------------------------
1        |  1       |   2021-07-01 14:00:00 |   0
2        |  1       |   2021-07-01 16:00:00 |   1
3        |  2       |   2021-07-02 05:01:01 |   1
4        |  1       |   2021-07-04 03:25:34 |   0
5        |  2       |   2021-07-05 11:20:50 |   0
6        |  2       |   2021-07-07 12:34:56 |   1

checkouts

checkout_id |   checkout_time       | user_id   |   success
------------------------------------------------------------
1           |   2021-07-01 18:00:00 |   1       |   0
2           |   2021-07-02 06:54:32 |   2       |   1
3           |   2021-07-04 13:00:01 |   1       |   1
4           |   2021-07-08 09:05:00 |   2       |   1

根据此信息,我如何获得以下 table 以及每次结帐时包含的历史表现 截至当时?

checkout_id | checkout              | user_id | lastGoodLogin       | lastFailedLogin     |  lastGoodCheckout   |  lastFailedCheckout |
---------------------------------------------------------------------------------------------------------------------------------------
1           | 2021-07-01 18:00:00   | 1       | 2021-07-01 16:00:00 | 2021-07-01 14:00:00 |       NULL          |     NULL
2           | 2021-07-02 06:54:32   | 2       | 2021-07-02 05:01:01 |       NULL          |       NULL          |     NULL
3           | 2021-07-04 13:00:01   | 1       | 2021-07-01 16:00:00 | 2021-07-04 03:25:34 |       NULL          | 2021-07-01 18:00:00
4           | 2021-07-08 09:05:00   | 2       | 2021-07-07 12:34:56 | 2021-07-05 11:20:50 | 2021-07-02 06:54:32 |     NULL

更新:我能够得到 lastFailedCheckoutlastGoodCheckout,因为那是在同一个 table(结帐)上执行 window 操作,但我不明白如何最好将它与 login_attempts table 结合起来以获得 last[Good|Failed]Login 字段。 (sqlfiddle)

P.S.: 我也愿意接受 PostgreSQL 建议。

好的开始! SQL 中的几件事 - 1) 您真的应该尽量避免不等式连接,因为它们会导致数据爆炸,在这种情况下不需要。只需在您的 window 函数中放置一条 CASE 语句,即可仅使用您想要的结帐(或登录)类型。 2) 您可以使用 frame 子句在查找以前的结帐时不自我 select 同一行。

一旦你有了这个模式,你就可以用它来找到你正在寻找的其他 2 列数据。第一步是将表 UNION 在一起,而不是 JOIN。这意味着要制作更多的列,以便数据可以在一起,但这很容易。现在你有了用户 ID 和“事情”发生的时间都在同一个数据中。您只需要再 WINDOW 2 次就可以拉取您想要的信息。最后,您需要使用外部 select w/ where 子句去除非结帐行。

像这样:

create table login_attempts(
  loginid smallint,
  userid smallint,
  login timestamp,
  success smallint
);

create table checkouts(
  checkoutid smallint,
  userid smallint,
  checkout_time timestamp,
  success smallint
);


insert into login_attempts values
(1, 1, '2021-07-01 14:00:00', 0),
(2, 1, '2021-07-01 16:00:00', 1),
(3, 2, '2021-07-02 05:01:01', 1),
(4, 1, '2021-07-04 03:25:34', 0),
(5, 2, '2021-07-05 11:20:50', 0),
(6, 2, '2021-07-07 12:34:56', 1)
;

insert into checkouts values
(1, 1, '2021-07-01 18:00:00', 0),
(2, 2, '2021-07-02 06:54:32', 1),
(3, 1, '2021-07-04 13:00:01', 1),
(4, 2, '2021-07-08 09:05:00', 1)
;

SQL:

select * 
from (
 select 
  c.checkoutid,
  c.userid,
  c.checkout_time,
  
  max(case success when 0 then checkout_time end) over (
    partition by userid
    order by event_time
    rows between unbounded preceding and 1 preceding
  ) as lastFailedCheckout,  
  
  max(case success when 1 then checkout_time end) over (
    partition by userid
    order by event_time
    rows between unbounded preceding and 1 preceding
  ) as lastGoodCheckout,

  max(case lsuccess when 0 then login end) over (
    partition by userid
    order by event_time
    rows between unbounded preceding and 1 preceding
  ) as lastFailedLogin,  
  
  max(case lsuccess when 1 then login end) over (
    partition by userid
    order by event_time
    rows between unbounded preceding and 1 preceding
  ) as lastGoodLogin
  
 from (
  select checkout_time as event_time, checkoutid, userid, 
    checkout_time, success,
    NULL as login, NULL as lsuccess
  from checkouts
  UNION ALL
  select login as event_time,NULL as checkoutid, userid, 
    NULL as checkout_time, NULL as success,  
    login, success as lsuccess
  from login_attempts
 ) c
) o
where o.checkoutid is not null
order by o.checkoutid