如何使用 BigQuery 对两个相邻会话使用 date_diff?
How to use date_diff for two adjacent sessions using BigQuery?
我正在尝试使用来自以下 table:
的数据计算两个相邻会话之间的平均小时数
user_id
event_timestamp
session_num
一个
2021-04-16 10:00:00.000 UTC
1
一个
2021-04-16 11:00:00.000 UTC
2
一个
2021-04-16 13:00:00.000 UTC
3
一个
2021-04-16 16:00:00.000 UTC
4
B
2021-04-16 12:00:00.000 UTC
1
B
2021-04-16 14:00:00.000 UTC
2
B
2021-04-16 19:00:00.000 UTC
3
C
2021-04-16 10:00:00.000 UTC
1
C
2021-04-16 17:00:00.000 UTC
2
C
2021-04-16 18:00:00.000 UTC
3
因此,对于用户 A,我们有
1 hour between session_num = 2 and session_num = 1,
2 hours between session_num = 3 and session_num = 2,
3 hours between session_num = 4 and session_num = 3.
其他用户也一样:
用户 B 2, 5
小时;
7, 1
用户 C.
小时
我期望得到的结果应该是这个date_diff(HOUR)的算术平均值。
因此,avg(1,2,3,2,5,7,1)
= 3 小时是两个相邻会话之间的平均时间。
任何人都知道可以使用什么查询,以便 date_diff 函数仅适用于相邻会话?
试试这个:
with mytable as (
select 'A' as user_id, timestamp '2021-04-16 10:00:00.000' as event_timestamp, 1 as session_num union all
select 'A', '2021-04-16 11:00:00.000', 2 as session_num union all
select 'A', '2021-04-16 13:00:00.000', 3 as session_num union all
select 'A', '2021-04-16 16:00:00.000', 4 as session_num union all
select 'B', '2021-04-16 12:00:00.000', 1 as session_num union all
select 'B', '2021-04-16 14:00:00.000', 2 as session_num union all
select 'B', '2021-04-16 19:00:00.000', 3 as session_num union all
select 'C', '2021-04-16 10:00:00.000', 1 as session_num union all
select 'C', '2021-04-16 17:00:00.000', 2 as session_num union all
select 'C', '2021-04-16 18:00:00.000', 3 as session_num
)
select avg(diff) as average
from (
select
user_id,
timestamp_diff(event_timestamp, lag(event_timestamp) OVER (partition by user_id order by event_timestamp), hour) as diff
from mytable
)
给定用户会话之间的平均小时数最简单地计算为:
select user_id,
timestamp_diff(max(event_timestamp), min(event_timestamp), hour) * 1.0 / nullif(count(*) - 1, 0)
from t
group by user_id;
即一个用户的平均会话间隔时间是最大时间戳减去最小时间戳除以会话数减一。
我正在尝试使用来自以下 table:
的数据计算两个相邻会话之间的平均小时数user_id | event_timestamp | session_num |
---|---|---|
一个 | 2021-04-16 10:00:00.000 UTC | 1 |
一个 | 2021-04-16 11:00:00.000 UTC | 2 |
一个 | 2021-04-16 13:00:00.000 UTC | 3 |
一个 | 2021-04-16 16:00:00.000 UTC | 4 |
B | 2021-04-16 12:00:00.000 UTC | 1 |
B | 2021-04-16 14:00:00.000 UTC | 2 |
B | 2021-04-16 19:00:00.000 UTC | 3 |
C | 2021-04-16 10:00:00.000 UTC | 1 |
C | 2021-04-16 17:00:00.000 UTC | 2 |
C | 2021-04-16 18:00:00.000 UTC | 3 |
因此,对于用户 A,我们有
1 hour between session_num = 2 and session_num = 1,
2 hours between session_num = 3 and session_num = 2,
3 hours between session_num = 4 and session_num = 3.
其他用户也一样:
用户 B2, 5
小时;
7, 1
用户 C.
我期望得到的结果应该是这个date_diff(HOUR)的算术平均值。
因此,avg(1,2,3,2,5,7,1)
= 3 小时是两个相邻会话之间的平均时间。
任何人都知道可以使用什么查询,以便 date_diff 函数仅适用于相邻会话?
试试这个:
with mytable as (
select 'A' as user_id, timestamp '2021-04-16 10:00:00.000' as event_timestamp, 1 as session_num union all
select 'A', '2021-04-16 11:00:00.000', 2 as session_num union all
select 'A', '2021-04-16 13:00:00.000', 3 as session_num union all
select 'A', '2021-04-16 16:00:00.000', 4 as session_num union all
select 'B', '2021-04-16 12:00:00.000', 1 as session_num union all
select 'B', '2021-04-16 14:00:00.000', 2 as session_num union all
select 'B', '2021-04-16 19:00:00.000', 3 as session_num union all
select 'C', '2021-04-16 10:00:00.000', 1 as session_num union all
select 'C', '2021-04-16 17:00:00.000', 2 as session_num union all
select 'C', '2021-04-16 18:00:00.000', 3 as session_num
)
select avg(diff) as average
from (
select
user_id,
timestamp_diff(event_timestamp, lag(event_timestamp) OVER (partition by user_id order by event_timestamp), hour) as diff
from mytable
)
给定用户会话之间的平均小时数最简单地计算为:
select user_id,
timestamp_diff(max(event_timestamp), min(event_timestamp), hour) * 1.0 / nullif(count(*) - 1, 0)
from t
group by user_id;
即一个用户的平均会话间隔时间是最大时间戳减去最小时间戳除以会话数减一。