SQL 分区中两行之间的时间差 window

SQL time difference between two rows in a partition window

我有一个 table 分析事件,我正在尝试计算两行之间的时间差,即用户在尝试开始和实际开始之间花费的时间。

我的数据是这样的:

# session type recordedAt
1 D4E77C feedbackProvided 2021-08-17T09:13:00.768+03:00
2 D4E77C feedbackProvided 2021-08-17T12:06:03.301+03:00
3 D4E77C feedbackProvided 2021-08-17T14:28:15.083+03:00
4 D4E77C feedbackProvided 2021-08-17T14:28:17.12+03:00
5 D4E77C buttonClicked 2021-08-17T14:28:18.383+03:00
6 D4E77C measurementStarted 2021-08-17T14:28:22.437+03:00
7 D4E77C buttonClicked 2021-08-17T14:28:23.572+03:00
8 D4E77C measurementCancelled 2021-08-17T14:28:23.573+03:00

这些只是给定会话的行,假设有很多会话。

我正在尝试计算第一个反馈提供和第一个测量开始之间 recordedAt 的差异。但是,我只希望在测量开始后 3 分钟之内考虑第一次提供的反馈。所以在这种情况下,我们会查看 1 和 6 之间的差异,但时间 > 3 分钟。 2和6,时间>3分钟。 3和6,时间是~7秒。

我第一次看一些分区,我很接近,但我无法计算出 3 分钟的最大时差。

我走对了吗?

WITH firstFeedbackProvided AS (
  SELECT 
    session, type, recordedAt,
    ROW_NUMBER() over(partition by session order by recordedAt) rn
  FROM events
  WHERE type='feedbackProvided'
),
firstMeasurementStarted AS (
  SELECT 
    session, type, recordedAt,
    ROW_NUMBER() over(partition by session order by recordedAt) rn
  FROM events
  WHERE type='measurementStarted'
)
SELECT 
  *,
  date_diff('millisecond', t1.recordedAt, t2.recordedAt) as diff
FROM firstFeedbackProvided as t1
JOIN firstMeasurementStarted as t2 ON t1.session = t2.session
WHERE t1.rn = 1
AND t2.rn = 1

我会建议对间隙和孤岛问题的下一个解释 - 过滤掉不是 measurementStartedfeedbackProvided 的所有内容,根据前一行 measurementStarted 创建组,找到组中的最大时间(对于 measurementStarted 应该是一个)并使用它从组中过滤掉 feedbackProvided 条记录。

数据:

WITH dataset AS (
  SELECT * 
  FROM 
    (
      VALUES 
('D4E77C',  'feedbackProvided',  from_iso8601_timestamp('2021-08-17T09:13:00.768+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-17T12:06:03.301+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-17T14:28:15.083+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-17T14:28:17.12+03:00')),
('D4E77C',  'buttonClicked',    from_iso8601_timestamp('2021-08-17T14:28:18.383+03:00')),
('D4E77C',  'measurementStarted',   from_iso8601_timestamp('2021-08-17T14:28:22.437+03:00')),
('D4E77C',  'buttonClicked',    from_iso8601_timestamp('2021-08-17T14:28:23.572+03:00')),      
('D4E77C',  'measurementCancelled', from_iso8601_timestamp('2021-08-17T14:28:23.573+03:00')),
      
('D4E77C1', 'feedbackProvided',  from_iso8601_timestamp('2021-08-17T09:13:00.768+03:00')),
('D4E77C1', 'feedbackProvided', from_iso8601_timestamp('2021-08-17T12:06:03.301+03:00')),
('D4E77C1', 'feedbackProvided', from_iso8601_timestamp('2021-08-17T14:28:15.083+03:00')),
('D4E77C1', 'feedbackProvided', from_iso8601_timestamp('2021-08-17T14:28:17.12+03:00')),
('D4E77C1', 'buttonClicked',    from_iso8601_timestamp('2021-08-17T14:28:18.383+03:00')),
('D4E77C1', 'measurementStarted',   from_iso8601_timestamp('2021-08-17T14:28:22.437+03:00')),
('D4E77C1', 'buttonClicked',    from_iso8601_timestamp('2021-08-17T14:28:23.572+03:00')),
('D4E77C1', 'measurementCancelled', from_iso8601_timestamp('2021-08-17T14:28:23.573+03:00')),
      
('D4E77C',  'feedbackProvided',  from_iso8601_timestamp('2021-08-18T09:13:00.768+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-18T12:06:03.301+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-18T14:28:15.083+03:00')),
('D4E77C',  'feedbackProvided', from_iso8601_timestamp('2021-08-18T14:28:17.12+03:00')),
('D4E77C',  'buttonClicked',    from_iso8601_timestamp('2021-08-18T14:28:18.383+03:00')),
('D4E77C',  'measurementStarted',   from_iso8601_timestamp('2021-08-18T14:28:22.437+03:00')),
('D4E77C',  'buttonClicked',    from_iso8601_timestamp('2021-08-18T14:28:23.572+03:00')),
('D4E77C',  'measurementCancelled', from_iso8601_timestamp('2021-08-18T14:28:23.573+03:00'))
    ) AS t (session,    type,   recordedAt)
) 
select session, max(recordedAt) - min(recordedAt)
from (
         select *, max(recordedAt) over (partition by session, grp) as m_started_date
         from (
                  select *,
                         sum(case when prev_type = 'measurementStarted' then 1 else 0 end)
                             over (partition by session order by recordedAt) as grp
                  from (
                           select session,
                                  type,
                                  recordedAt,
                                  lag(type) over (partition by session order by recordedAt) as prev_type
                           from dataset
                           where type in ('measurementStarted', 'feedbackProvided')
                       )
              )
     )
where m_started_date - recordedAt < interval '3' minute
group by session, grp

输出:

session _col1
D4E77C1 0 00:00:07.354
D4E77C 0 00:00:07.354
D4E77C 0 00:00:07.354

我认为你把问题复杂化了。执行以下操作:

  1. 计算每个会话的第一次测量发生的时间。
  2. 过滤行以仅包含您时间范围内在此之前的反馈事件。
  3. 汇总

在 SQL 中,这看起来像:

select session,
       first_measurementStarted - min(recordedat) 
from (select e.*,
             min(case when type = 'measurementStarted' then recordedat end) over (partition by session) as first_measurementStarted
      from events e 
     ) e
where recordedat > first_measurementStarted - interval '3' minute and
     type = 'feedbackProvided'
group by session, first_measurementStarted;