如何将 Snowflake 中的 LAG FUNCTION 与 TIMESTAMPS 一起使用?

How can I use the LAG FUNCTION in Snowflake with TIMESTAMPS?

以下是我的基础 table 的构建方式:

我有用户将各种操作日期作为时间戳(日期、小时、分钟、秒)。用户的操作要么相隔几天,要么相隔几个小时。我正在尝试使用 LAG 函数找到每个用户的每个操作之间的间隔。当我将时间戳转换为 DATES 时,我在 Snowflake 中的查询工作得很好。当用户在同一天有2个动作时,我的时间间隔是=0。我想看到这个基于分钟(或秒,无所谓)的时间间隔。这是我在雪花中使用的当前查询:

 SELECT 
    USERS,
  RANK() OVER(PARTITION BY USERS ORDER BY ACTION_DATE ASC) RowNumber,
  CAST(ACTION_DATE AS DATE),
  (CAST(ACTION_DATE AS DATE) - LAG(CAST(ACTION_DATE AS DATE)) OVER (PARTITION BY users ORDER BY ACTION_DATE)) AS TIME_INTERVAL
from TABLE1
ORDER BY 1,2,3;

截至目前,此查询在 Snowflake 中运行良好,但我需要能够使用我的时间戳获取这些时间间隔,而不仅仅是将我的时间戳转换为日期。

我在 Snowflake 中得到的错误是:

SQL compilation error: error line 6 at position 21 Invalid argument types for function '-': (TIMESTAMP_NTZ(9), TIMESTAMP_NTZ(9))

有人知道我如何使用带有时间戳的 LAG FUNCTION 或者我应该使用其他函数吗?

如果您想要不同,请使用 datediff()timestampdiff()。秒数:

DATEDIFF(second,
         LAG(ACTION_DATE) OVER (PARTITION BY users ORDER BY ACTION_DATE),
         ACTION_DATE
        ) AS DIFF_SECONDS

您需要使用 timestampdiffdatediff,因为您不能使用 - 运算符减去两个时间戳。这是一个可重现的示例,展示了如何在几秒钟、几分钟和几小时内执行此操作。

create or replace transient table users
(
    users       varchar,
    action_date timestamp_ntz
);

insert overwrite into users
values ('simon', '2020-01-01T01:00:00'),
       ('simon', '2020-01-01T02:00:00'),
       ('simon', '2020-01-02T01:00:00'),
       ('simon', '2020-01-02T02:00:00'),
       ('simon', '2020-01-03T01:00:00'),
       ('simon', '2020-01-04T01:00:00'),
       ('jen', '2020-01-01T01:00:00'),
       ('jen', '2020-01-02T01:00:00'),
       ('jen', '2020-01-03T01:00:00'),
       ('jen', '2020-01-04T01:00:00')
;

SELECT
    USERS                                                                                 as users,
    action_date                                                                           as action_date,
    RANK() OVER (PARTITION BY USERS ORDER BY ACTION_DATE ASC)                             as row_number,
    timestampdiff('minutes', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS minutes_interval,
    timestampdiff('seconds', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS seconds_interval,
    timestampdiff('hours', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS hours_interval
from USERS
ORDER BY 1, 2, 3;

以上产生:

+-----+-----------------------------+----------+----------------+----------------+--------------+
|USERS|ACTION_DATE                  |ROW_NUMBER|MINUTES_INTERVAL|SECONDS_INTERVAL|HOURS_INTERVAL|
+-----+-----------------------------+----------+----------------+----------------+--------------+
|jen  |2020-01-01 01:00:00.000000000|1         |NULL            |NULL            |NULL          |
|jen  |2020-01-02 01:00:00.000000000|2         |-1440           |-86400          |-24           |
|jen  |2020-01-03 01:00:00.000000000|3         |-1440           |-86400          |-24           |
|jen  |2020-01-04 01:00:00.000000000|4         |-1440           |-86400          |-24           |
|simon|2020-01-01 01:00:00.000000000|1         |NULL            |NULL            |NULL          |
|simon|2020-01-01 02:00:00.000000000|2         |-60             |-3600           |-1            |
|simon|2020-01-02 01:00:00.000000000|3         |-1380           |-82800          |-23           |
|simon|2020-01-02 02:00:00.000000000|4         |-60             |-3600           |-1            |
|simon|2020-01-03 01:00:00.000000000|5         |-1380           |-82800          |-23           |
|simon|2020-01-04 01:00:00.000000000|6         |-1440           |-86400          |-24           |
+-----+-----------------------------+----------+----------------+----------------+--------------+