在 SQL 中获取无循环的连续行值

Get successive row values without loop in SQL

我有以下 table:

AppId Id Direction Text Date
aaa 11 in hello 11/2/2021 3:03:00 PM
aaa 22 out yes? 11/2/2021 3:04:00 PM
aaa 33 in need help! 11/3/2021 3:06:00 PM
aaa 44 in you there? 11/4/2021 3:10:00 PM
aaa 55 out yes! 11/5/2021 4:00:00 PM
bb 111 out welcome! 11/6/2021 6:09:00 PM
bb 222 in can i call? 11/6/2021 6:39:00 PM
bb 333 out sure. 11/6/2021 8:22:00 PM
cc 1111 out hello? 11/8/2021 2:22:00 PM
cc 2222 in Whatsup! 11/8/2021 3:22:00 PM

Id 是一个主键,方向基本上告诉我们它是传入消息还是传出消息。 App id 用于识别属于单个对话的各种 id。我想将传入消息后的第一个响应时间标识为:

AppId Id Direction Text Date ReplyDate
aaa 11 in hello 11/2/2021 3:03:00 PM 11/2/2021 3:04:00 PM
aaa 22 out yes? 11/2/2021 3:04:00 PM null
aaa 33 in need help! 11/3/2021 3:06:00 PM 11/5/2021 4:00:00 PM
aaa 44 in you there? 11/4/2021 3:10:00 PM 11/5/2021 4:00:00 PM
aaa 55 out yes! 11/5/2021 4:00:00 PM null
bb 111 out welcome! 11/6/2021 6:09:00 PM null
bb 222 in can i call? 11/6/2021 6:39:00 PM 11/6/2021 8:22:00 PM
bb 333 out sure. 11/6/2021 8:22:00 PM null
cc 1111 out hello? 11/8/2021 2:22:00 PM null
cc 2222 in Whatsup! 11/8/2021 3:22:00 PM null

对于任何 'out' 文本,回复列为空,但对于每个 'in' 文本,它会带来 'out' 的下一个最新时间戳。如果传入文本后没有传出文本,则传入的 'ReplyDate' 也为空,如 'cc'.

的情况

这可以在 SQL 中完成吗?我正在使用 vertica,它不允许递归或循环语句,所以我必须在没有它们的情况下实现这一点。

我已经能够使用 lead () 来获取下一个输出文本的时间,但无法为所有以前的输入文本填充它。

这是我迄今为止尝试过的方法,但这并没有让我得到所需的结果:

with cte as (
select 
row_number() over(partition by AppId order by date asc) as rn,
Id,
AppId
Direction,
Text,
Date,
lead(Direction, 1) over(order by Date asc) as lead_direction,
lead(Date, 1) over (order by Date asc) as lead_date,
from table
order by Date desc)
select 
Id,
AppId
Direction,
Text
Date,
case when Direction = 'Out' then null
     when lead_direction = null then null
     when rn <> 1 and Direction = 'In' and Direction = lead_direction then null
     when rn <> 1 and Direction = 'In' and Direction <> lead_direction then lead_date
     end as ReplyDate
from cte

任何帮助将不胜感激。

虽然这个解决方案不是最干净的解决方案,但它完成了工作。

WITH
main_data AS (
                  SELECT 'aaa' as AppId,11 as Id,'in' as Direction ,'hello' AS Text,'11/2/2021 3:03:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'aaa' as AppId,22 as Id,'out'as Direction ,'yes?' AS Text,'11/2/2021 3:04:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'aaa' as AppId,33 as Id,'in' as Direction ,'need help!' AS Text,'11/3/2021 3:06:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'aaa' as AppId,44 as Id,'in' as Direction ,'you there?' AS Text,'11/4/2021 3:10:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'aaa' as AppId,55 as Id,'out' as Direction ,'yes!' AS Text,'11/5/2021 4:00:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'bb'  as AppId,111 AS ID,'out' as Direction ,'welcome!' AS Text,'11/6/2021 6:09:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'bb'  as AppId,222 AS ID,'in' as Direction ,'can i call?' AS Text,'11/6/2021 6:39:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'bb'  as AppId,333 AS ID,'out' as Direction ,'sure.' AS Text,'11/6/2021 8:22:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'cc'  as AppId,1111AS ID,'out' as Direction ,'hello?' AS Text,'11/8/2021 2:22:00 PM' AS DATE_TEXT
        UNION ALL SELECT 'cc'  as AppId,2222AS ID,'in' as Direction ,'Whatsup!' AS Text,'11/8/2021 3:22:00 PM' AS DATE_TEXT
    )
, temp_data AS (SELECT AppId
        , Id
        , Direction
        , TEXT
        , DATE_TEXT
        , CONDITIONAL_TRUE_EVENT(Direction = 'out') OVER (PARTITION BY AppId ORDER BY Id) as rank_logic
    FROM main_data
    )
SELECT t1.AppId
    , t1.Id
    , t1.Direction
    , t1.TEXT
    , t1.DATE_TEXT
    , t2.DATE_TEXT
FROM temp_data t1
LEFT JOIN temp_data t2
    ON t1.AppId = t2.AppId
        AND t1.rank_logic + 1 = t2.rank_logic
        AND t2.Direction = 'out'
        AND t1.Direction <> 'out';

输出

 AppId |  Id  | Direction |    TEXT     |      DATE_TEXT       |      DATE_TEXT       
-------+------+-----------+-------------+----------------------+----------------------
 aaa   |   11 | in        | hello       | 11/2/2021 3:03:00 PM | 11/2/2021 3:04:00 PM
 aaa   |   22 | out       | yes?        | 11/2/2021 3:04:00 PM | 
 aaa   |   33 | in        | need help!  | 11/3/2021 3:06:00 PM | 11/5/2021 4:00:00 PM
 aaa   |   44 | in        | you there?  | 11/4/2021 3:10:00 PM | 11/5/2021 4:00:00 PM
 aaa   |   55 | out       | yes!        | 11/5/2021 4:00:00 PM | 
 bb    |  111 | out       | welcome!    | 11/6/2021 6:09:00 PM | 
 bb    |  222 | in        | can i call? | 11/6/2021 6:39:00 PM | 11/6/2021 8:22:00 PM
 bb    |  333 | out       | sure.       | 11/6/2021 8:22:00 PM | 
 cc    | 1111 | out       | hello?      | 11/8/2021 2:22:00 PM | 
 cc    | 2222 | in        | Whatsup!    | 11/8/2021 3:22:00 PM | 
(10 rows)

尝试了几次,但现在我想我明白了。 WITH 子句中的第一个 CTE 不是最终查询的一部分 - 它只是将您的原始输入放入一个独立的演示查询中。

真正的查询,真正的WITH子句从那之后开始。

由于每个 'in' 行的结束 'out' 行可以在一、两或几行之后出现,这可以通过行为模式来解决:一个或多个 'in'行,后跟一 'out' 行。这就是 MATCH() 子句的用途。 在包含子句的查询中只返回满足模式的行。

依赖函数 PATTERN_ID() returns 在 PARTITION BY... ORDER BY 表达式中找到的模式的序号。

PARTITION BY 列和 PATTERN_ID 分组将帮助我获得模式的最后时间戳,我需要 replyts (我更改了列名以避免保留字,例如 DATETEXT).

最后,我只需要在 id 列和 [=24= 相等的情况下,将 indata CTE 与包含 MATCH() 子句的查询左连接] 等于 'in',第二个查询与分组查询。

我将中间结果作为 CTE 表达式中机制的说明...

WITH
-- input from you ...
indata(AppId,Id,Direction,txt,ts) AS (
          SELECT 'aaa',11,'in','hello',TIMESTAMP '11/2/2021 3:03:00 PM'
UNION ALL SELECT 'aaa',22,'out','yes?',TIMESTAMP '11/2/2021 3:04:00 PM'
UNION ALL SELECT 'aaa',33,'in','need help!',TIMESTAMP '11/3/2021 3:06:00 PM'
UNION ALL SELECT 'aaa',44,'in','you there?',TIMESTAMP '11/4/2021 3:10:00 PM'
UNION ALL SELECT 'aaa',55,'out','yes!',TIMESTAMP '11/5/2021 4:00:00 PM'
UNION ALL SELECT 'bb',111,'out','welcome!',TIMESTAMP '11/6/2021 6:09:00 PM'
UNION ALL SELECT 'bb',222,'in','can i call?',TIMESTAMP '11/6/2021 6:39:00 PM'
UNION ALL SELECT 'bb',333,'out','sure.',TIMESTAMP '11/6/2021 8:22:00 PM'
UNION ALL SELECT 'cc',1111,'out','hello?',TIMESTAMP '11/8/2021 2:22:00 PM'
UNION ALL SELECT 'cc',2222,'in','Whatsup!',TIMESTAMP '11/8/2021 3:22:00 PM'
)
-- real query starts here, replace following comma with "WITH" ...
,
-- the MATCH() clause in action - note the depending functions
-- PATTERN_ID(), MATCH_ID() and EVENT_NAME()
pattern_q AS (
  SELECT 
    appid
  , id
  , direction
  , txt
  , ts
  , PATTERN_ID()
  , MATCH_ID()
  , EVENT_NAME()
  FROM indata
  MATCH(
    PARTITION BY appid ORDER BY id
    DEFINE
      inbound  AS direction='in'
    , outbound AS direction='out'
    PATTERN
      p AS (inbound+ outbound)
  )
  -- out  appid   | id  | direction |     txt     |         ts          | PATTERN_ID | MATCH_ID | EVENT_NAME 
  -- out ---------+-----+-----------+-------------+---------------------+------------+----------+------------
  -- out  aaa     |  11 | in        | hello       | 2021-11-02 15:03:00 |          1 |        1 | inbound
  -- out  aaa     |  22 | out       | yes?        | 2021-11-02 15:04:00 |          1 |        2 | outbound
  -- out  aaa     |  33 | in        | need help!  | 2021-11-03 15:06:00 |          2 |        1 | inbound
  -- out  aaa     |  44 | in        | you there?  | 2021-11-04 15:10:00 |          2 |        2 | inbound
  -- out  aaa     |  55 | out       | yes!        | 2021-11-05 16:00:00 |          2 |        3 | outbound
  -- out  bb      | 222 | in        | can i call? | 2021-11-06 18:39:00 |          1 |        1 | inbound
  -- out  bb      | 333 | out       | sure.       | 2021-11-06 20:22:00 |          1 |        2 | outbound
)
,
-- need the last timestamp per PATTERN_ID() ... so grouping
pattern_grp AS (
  SELECT
    appid
  , pattern_id
  , MIN(ts) AS g_ts
  , MAX(ts) AS replyts
  FROM pattern_q
  GROUP BY
    appid
  , pattern_id
  -- out  appid | pattern_id |         ts          |       replyts       
  -- out -------+------------+---------------------+---------------------
  -- out  aaa   |          1 | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
  -- out  aaa   |          2 | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
  -- out  bb    |          1 | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
)
SELECT
  i.*
, g.replyts
FROM indata           i
LEFT JOIN pattern_q   p ON i.id = p.id       AND i.direction='in'
LEFT JOIN pattern_grp g ON p.appid = g.appid AND p.pattern_id = g.pattern_id 
-- out Null display is "(null)".
-- out  AppId |  Id  | Direction |     txt     |         ts          |       replyts       
-- out -------+------+-----------+-------------+---------------------+---------------------
-- out  aaa   |   11 | in        | hello       | 2021-11-02 15:03:00 | 2021-11-02 15:04:00
-- out  aaa   |   22 | out       | yes?        | 2021-11-02 15:04:00 | (null)
-- out  aaa   |   33 | in        | need help!  | 2021-11-03 15:06:00 | 2021-11-05 16:00:00
-- out  aaa   |   44 | in        | you there?  | 2021-11-04 15:10:00 | 2021-11-05 16:00:00
-- out  aaa   |   55 | out       | yes!        | 2021-11-05 16:00:00 | (null)
-- out  bb    |  111 | out       | welcome!    | 2021-11-06 18:09:00 | (null)
-- out  bb    |  222 | in        | can i call? | 2021-11-06 18:39:00 | 2021-11-06 20:22:00
-- out  bb    |  333 | out       | sure.       | 2021-11-06 20:22:00 | (null)
-- out  cc    | 1111 | out       | hello?      | 2021-11-08 14:22:00 | (null)
-- out  cc    | 2222 | in        | Whatsup!    | 2021-11-08 15:22:00 | (null)