如何将 Snowflake 中的 LAG FUNCTION 与 TIMESTAMPS 一起使用?
How can I use the LAG FUNCTION in Snowflake with TIMESTAMPS?
以下是我的基础 table 的构建方式:
我有用户将各种操作日期作为时间戳(日期、小时、分钟、秒)。用户的操作要么相隔几天,要么相隔几个小时。我正在尝试使用 LAG 函数找到每个用户的每个操作之间的间隔。当我将时间戳转换为 DATES 时,我在 Snowflake 中的查询工作得很好。当用户在同一天有2个动作时,我的时间间隔是=0。我想看到这个基于分钟(或秒,无所谓)的时间间隔。这是我在雪花中使用的当前查询:
SELECT
USERS,
RANK() OVER(PARTITION BY USERS ORDER BY ACTION_DATE ASC) RowNumber,
CAST(ACTION_DATE AS DATE),
(CAST(ACTION_DATE AS DATE) - LAG(CAST(ACTION_DATE AS DATE)) OVER (PARTITION BY users ORDER BY ACTION_DATE)) AS TIME_INTERVAL
from TABLE1
ORDER BY 1,2,3;
截至目前,此查询在 Snowflake 中运行良好,但我需要能够使用我的时间戳获取这些时间间隔,而不仅仅是将我的时间戳转换为日期。
我在 Snowflake 中得到的错误是:
SQL compilation error: error line 6 at position 21 Invalid argument types for function '-': (TIMESTAMP_NTZ(9), TIMESTAMP_NTZ(9))
有人知道我如何使用带有时间戳的 LAG FUNCTION 或者我应该使用其他函数吗?
如果您想要不同,请使用 datediff()
或 timestampdiff()
。秒数:
DATEDIFF(second,
LAG(ACTION_DATE) OVER (PARTITION BY users ORDER BY ACTION_DATE),
ACTION_DATE
) AS DIFF_SECONDS
您需要使用 timestampdiff
或 datediff
,因为您不能使用 -
运算符减去两个时间戳。这是一个可重现的示例,展示了如何在几秒钟、几分钟和几小时内执行此操作。
create or replace transient table users
(
users varchar,
action_date timestamp_ntz
);
insert overwrite into users
values ('simon', '2020-01-01T01:00:00'),
('simon', '2020-01-01T02:00:00'),
('simon', '2020-01-02T01:00:00'),
('simon', '2020-01-02T02:00:00'),
('simon', '2020-01-03T01:00:00'),
('simon', '2020-01-04T01:00:00'),
('jen', '2020-01-01T01:00:00'),
('jen', '2020-01-02T01:00:00'),
('jen', '2020-01-03T01:00:00'),
('jen', '2020-01-04T01:00:00')
;
SELECT
USERS as users,
action_date as action_date,
RANK() OVER (PARTITION BY USERS ORDER BY ACTION_DATE ASC) as row_number,
timestampdiff('minutes', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS minutes_interval,
timestampdiff('seconds', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS seconds_interval,
timestampdiff('hours', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS hours_interval
from USERS
ORDER BY 1, 2, 3;
以上产生:
+-----+-----------------------------+----------+----------------+----------------+--------------+
|USERS|ACTION_DATE |ROW_NUMBER|MINUTES_INTERVAL|SECONDS_INTERVAL|HOURS_INTERVAL|
+-----+-----------------------------+----------+----------------+----------------+--------------+
|jen |2020-01-01 01:00:00.000000000|1 |NULL |NULL |NULL |
|jen |2020-01-02 01:00:00.000000000|2 |-1440 |-86400 |-24 |
|jen |2020-01-03 01:00:00.000000000|3 |-1440 |-86400 |-24 |
|jen |2020-01-04 01:00:00.000000000|4 |-1440 |-86400 |-24 |
|simon|2020-01-01 01:00:00.000000000|1 |NULL |NULL |NULL |
|simon|2020-01-01 02:00:00.000000000|2 |-60 |-3600 |-1 |
|simon|2020-01-02 01:00:00.000000000|3 |-1380 |-82800 |-23 |
|simon|2020-01-02 02:00:00.000000000|4 |-60 |-3600 |-1 |
|simon|2020-01-03 01:00:00.000000000|5 |-1380 |-82800 |-23 |
|simon|2020-01-04 01:00:00.000000000|6 |-1440 |-86400 |-24 |
+-----+-----------------------------+----------+----------------+----------------+--------------+
以下是我的基础 table 的构建方式:
我有用户将各种操作日期作为时间戳(日期、小时、分钟、秒)。用户的操作要么相隔几天,要么相隔几个小时。我正在尝试使用 LAG 函数找到每个用户的每个操作之间的间隔。当我将时间戳转换为 DATES 时,我在 Snowflake 中的查询工作得很好。当用户在同一天有2个动作时,我的时间间隔是=0。我想看到这个基于分钟(或秒,无所谓)的时间间隔。这是我在雪花中使用的当前查询:
SELECT
USERS,
RANK() OVER(PARTITION BY USERS ORDER BY ACTION_DATE ASC) RowNumber,
CAST(ACTION_DATE AS DATE),
(CAST(ACTION_DATE AS DATE) - LAG(CAST(ACTION_DATE AS DATE)) OVER (PARTITION BY users ORDER BY ACTION_DATE)) AS TIME_INTERVAL
from TABLE1
ORDER BY 1,2,3;
截至目前,此查询在 Snowflake 中运行良好,但我需要能够使用我的时间戳获取这些时间间隔,而不仅仅是将我的时间戳转换为日期。
我在 Snowflake 中得到的错误是:
SQL compilation error: error line 6 at position 21 Invalid argument types for function '-': (TIMESTAMP_NTZ(9), TIMESTAMP_NTZ(9))
有人知道我如何使用带有时间戳的 LAG FUNCTION 或者我应该使用其他函数吗?
如果您想要不同,请使用 datediff()
或 timestampdiff()
。秒数:
DATEDIFF(second,
LAG(ACTION_DATE) OVER (PARTITION BY users ORDER BY ACTION_DATE),
ACTION_DATE
) AS DIFF_SECONDS
您需要使用 timestampdiff
或 datediff
,因为您不能使用 -
运算符减去两个时间戳。这是一个可重现的示例,展示了如何在几秒钟、几分钟和几小时内执行此操作。
create or replace transient table users
(
users varchar,
action_date timestamp_ntz
);
insert overwrite into users
values ('simon', '2020-01-01T01:00:00'),
('simon', '2020-01-01T02:00:00'),
('simon', '2020-01-02T01:00:00'),
('simon', '2020-01-02T02:00:00'),
('simon', '2020-01-03T01:00:00'),
('simon', '2020-01-04T01:00:00'),
('jen', '2020-01-01T01:00:00'),
('jen', '2020-01-02T01:00:00'),
('jen', '2020-01-03T01:00:00'),
('jen', '2020-01-04T01:00:00')
;
SELECT
USERS as users,
action_date as action_date,
RANK() OVER (PARTITION BY USERS ORDER BY ACTION_DATE ASC) as row_number,
timestampdiff('minutes', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS minutes_interval,
timestampdiff('seconds', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS seconds_interval,
timestampdiff('hours', action_date, LAG(action_date) OVER (PARTITION BY users ORDER BY action_date)) AS hours_interval
from USERS
ORDER BY 1, 2, 3;
以上产生:
+-----+-----------------------------+----------+----------------+----------------+--------------+
|USERS|ACTION_DATE |ROW_NUMBER|MINUTES_INTERVAL|SECONDS_INTERVAL|HOURS_INTERVAL|
+-----+-----------------------------+----------+----------------+----------------+--------------+
|jen |2020-01-01 01:00:00.000000000|1 |NULL |NULL |NULL |
|jen |2020-01-02 01:00:00.000000000|2 |-1440 |-86400 |-24 |
|jen |2020-01-03 01:00:00.000000000|3 |-1440 |-86400 |-24 |
|jen |2020-01-04 01:00:00.000000000|4 |-1440 |-86400 |-24 |
|simon|2020-01-01 01:00:00.000000000|1 |NULL |NULL |NULL |
|simon|2020-01-01 02:00:00.000000000|2 |-60 |-3600 |-1 |
|simon|2020-01-02 01:00:00.000000000|3 |-1380 |-82800 |-23 |
|simon|2020-01-02 02:00:00.000000000|4 |-60 |-3600 |-1 |
|simon|2020-01-03 01:00:00.000000000|5 |-1380 |-82800 |-23 |
|simon|2020-01-04 01:00:00.000000000|6 |-1440 |-86400 |-24 |
+-----+-----------------------------+----------+----------------+----------------+--------------+