在 Big Query 中使用“Lead”window 函数时出现时间戳问题
Trouble with Timestamps when using `Lead` window function in Big Query
我正在尝试获取客户的第一个订单、下一个订单以及两个订单之间的天数差异。看起来很简单。我遵循的步骤如下:
- 使用 MIN() 和 LEAD() 函数提取客户的第一个和第二个订单
- 运行 DATEDIFF 与这 2 个字段相差天数。
简短代码如下所示:
SELECT cust, MIN(ord_time) first_ord, LEAD(ord_time, 1)
OVER
(PARTITION BY customer_id
ORDER BY ord_time) next_ord
FROM
(SELECT cust, ord_time
FROM df.orders
GROUP EACH BY cust, ord_time)
那里还有一些其他过滤连接和分组,但这是基本块。
输出应该是一个包含客户 ID 的字段和两个时间戳字段。两个时间戳字段如下所示:
所以一切看起来都很棒。但是,当我尝试 运行 带有两个字段的 DATEDIFF() 函数时,一切都返回 Null。
此外,当我将鼠标悬停在任一时间戳字段上时,它会告诉我数据类型是 TIMESTAMP,但是当我尝试 运行 将任何类型的时间戳转换为秒或 next_ord 字段时导致它失败并出现 "type unknown".
错误
只是寻找我做错了什么或解决这个问题的任何方法。
感谢您的帮助。
我认为这与 window 函数如何处理时间戳有关
这是我目前看到的:
1.
当源数据点是字符串时 - 一切都按预期工作:
SELECT
customer_id,
first_ord,
next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, '2014-04-08 09:51:24 UTC' AS ord_time),
(SELECT 1 AS customer_id, '2014-04-08 09:53:31 UTC' AS ord_time),
(SELECT 1 AS customer_id, '2014-05-08 09:53:31 UTC' AS ord_time),
(SELECT 2 AS customer_id, '2014-09-12 17:20:43 UTC' AS ord_time),
(SELECT 2 AS customer_id, '2015-04-16 21:44:18 UTC' AS ord_time),
)
WHERE num = 1
结果:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216
2.
当源数据点是时间戳时 - 结果为 null 正如您在问题中所述:
SELECT
customer_id,
first_ord,
next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time),
)
WHERE num = 1
结果:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC null
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC null
3.
为了“修复”,我不得不进行如下转换:
SELECT
customer_id,
TIMESTAMP(first_ord) as first_ord,
TIMESTAMP(next_ord) as next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(STRING(ord_time), 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(STRING(ord_time), 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time)
)
WHERE num = 1
结果是:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216
我正在尝试获取客户的第一个订单、下一个订单以及两个订单之间的天数差异。看起来很简单。我遵循的步骤如下:
- 使用 MIN() 和 LEAD() 函数提取客户的第一个和第二个订单
- 运行 DATEDIFF 与这 2 个字段相差天数。
简短代码如下所示:
SELECT cust, MIN(ord_time) first_ord, LEAD(ord_time, 1)
OVER
(PARTITION BY customer_id
ORDER BY ord_time) next_ord
FROM
(SELECT cust, ord_time
FROM df.orders
GROUP EACH BY cust, ord_time)
那里还有一些其他过滤连接和分组,但这是基本块。
输出应该是一个包含客户 ID 的字段和两个时间戳字段。两个时间戳字段如下所示:
所以一切看起来都很棒。但是,当我尝试 运行 带有两个字段的 DATEDIFF() 函数时,一切都返回 Null。
此外,当我将鼠标悬停在任一时间戳字段上时,它会告诉我数据类型是 TIMESTAMP,但是当我尝试 运行 将任何类型的时间戳转换为秒或 next_ord 字段时导致它失败并出现 "type unknown".
错误只是寻找我做错了什么或解决这个问题的任何方法。
感谢您的帮助。
我认为这与 window 函数如何处理时间戳有关
这是我目前看到的:
1.
当源数据点是字符串时 - 一切都按预期工作:
SELECT
customer_id,
first_ord,
next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, '2014-04-08 09:51:24 UTC' AS ord_time),
(SELECT 1 AS customer_id, '2014-04-08 09:53:31 UTC' AS ord_time),
(SELECT 1 AS customer_id, '2014-05-08 09:53:31 UTC' AS ord_time),
(SELECT 2 AS customer_id, '2014-09-12 17:20:43 UTC' AS ord_time),
(SELECT 2 AS customer_id, '2015-04-16 21:44:18 UTC' AS ord_time),
)
WHERE num = 1
结果:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216
2.
当源数据点是时间戳时 - 结果为 null 正如您在问题中所述:
SELECT
customer_id,
first_ord,
next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(ord_time, 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(ord_time, 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time),
)
WHERE num = 1
结果:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC null
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC null
3.
为了“修复”,我不得不进行如下转换:
SELECT
customer_id,
TIMESTAMP(first_ord) as first_ord,
TIMESTAMP(next_ord) as next_ord,
DATEDIFF(next_ord, first_ord) AS diff
FROM (
SELECT
customer_id,
LEAD(STRING(ord_time), 0) OVER (PARTITION BY customer_id ORDER BY ord_time) first_ord,
LEAD(STRING(ord_time), 1) OVER (PARTITION BY customer_id ORDER BY ord_time) next_ord,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY ord_time) num
FROM
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:51:24 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-04-08 09:53:31 UTC') AS ord_time),
(SELECT 1 AS customer_id, TIMESTAMP('2014-05-08 09:53:31 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2014-09-12 17:20:43 UTC') AS ord_time),
(SELECT 2 AS customer_id, TIMESTAMP('2015-04-16 21:44:18 UTC') AS ord_time)
)
WHERE num = 1
结果是:
customer_id first_ord next_ord diff
1 2014-04-08 09:51:24 UTC 2014-04-08 09:53:31 UTC 0
2 2014-09-12 17:20:43 UTC 2015-04-16 21:44:18 UTC 216