计算用户在 SQL 内访问页面的次数
Calculate number of times a user visited a page in SQL
我有 table 如下所示。它有 3 列 user
、page
和 timestamp
。
+------+------------------+-----------+
| user | page | timestamp |
+------+------------------+-----------+
| 1 | homepage?c=1234 | 1234 |
| 1 | homepage?c=1234 | 1245 |
| 1 | homepage?c=1234 | 1260 |
| 1 | homepage?c=1234 | 1280 |
| 1 | Signup?=1233 | 1293 |
| 1 | Signup?=121asd | 1303 |
| 1 | Signup?=212 | 1317 |
| 1 | Signup?123213 | 1337 |
| 1 | homepage?c=1234 | 1357 |
| 1 | Hotels | 1370 |
| 1 | Hotels | 1384 |
| 1 | Hotels | 1398 |
| 1 | Signup?=121asd | 1413 |
| 1 | Signup?=121as123 | 1433 |
| 1 | homepage?c=1234 | 1447 |
| 1 | homepage?c=1234 | 1463 |
| 1 | homepage?c=1234 | 1482 |
| 1 | homepage?c=1234 | 1496 |
+------+------------------+-----------+
在上面 table 我想计算用户访问特定页面的次数。上面 table 中的每条记录都是命中,因此仅按列分组 page
无济于事。
我基本上只想计算特定页面的连续记录一次。我只想在用户访问其间的其他页面时增加特定页面的计数器。
如下图:
用户访问主页 3 次(黄色),注册页面 2 次(蓝色)和酒店页面 1 次(橙色)。
预期的输出如下:
+-------+----------+--------+--------+
| | Homepage | Signup | Hotels |
+-------+----------+--------+--------+
| users | 3 | 2 | 1 |
+-------+----------+--------+--------+
所以您要做的是首先从 url 中获取页面名称,因此主页?c=1234 成为主页。然后,对于每次访问,您可以使用 LAG
查看页面是否在两者之间发生变化,并使用 window SUM
函数来总结到目前为止的所有变化。然后每个页面访问组将获得一个唯一的分组编号,然后只需计算每个页面访问组的不同分组编号即可。
为了清晰和理解,我将其拆分为单独的 CTE:
WITH pagesplit AS
(
SELECT user
, (split(page, '?')[safe_ordinal(1)]) AS page
, ts
FROM so.visits
),
page_with_prev AS
(
SELECT user
, page
, COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page
, ts
FROM pagesplit
),
count_consecutive AS
(
SELECT user
, page
, SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no
FROM page_with_prev
)
SELECT user
, page
, COUNT(DISTINCT grouping_no) AS visit_count
FROM count_streaks
GROUP BY user, page
将结果旋转到列中可能没有多大意义,除非您确切知道您将拥有多少个并且永远不会添加新的。如果你愿意,当然可以这样做。
同时考虑以下选项
扁平化输出
select user, page, count(distinct visit_number) visits
from (
select *, countif(new_visit) over(partition by user order by timestamp) visit_number
from (
select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
from (
select user, split(page, '?')[offset(0)] page, timestamp
from your_table
)
)
)
group by user, page
结果
或使用 Pivot
select * from (
select user, page, countif(new_visit) over(partition by user order by timestamp) visit_number
from (
select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
from (
select user, split(page, '?')[offset(0)] page, timestamp
from your_table
)
)
)
pivot (count(distinct visit_number) for lower(page) in ('homepage', 'signup', 'hotels'))
有输出
我有 table 如下所示。它有 3 列 user
、page
和 timestamp
。
+------+------------------+-----------+
| user | page | timestamp |
+------+------------------+-----------+
| 1 | homepage?c=1234 | 1234 |
| 1 | homepage?c=1234 | 1245 |
| 1 | homepage?c=1234 | 1260 |
| 1 | homepage?c=1234 | 1280 |
| 1 | Signup?=1233 | 1293 |
| 1 | Signup?=121asd | 1303 |
| 1 | Signup?=212 | 1317 |
| 1 | Signup?123213 | 1337 |
| 1 | homepage?c=1234 | 1357 |
| 1 | Hotels | 1370 |
| 1 | Hotels | 1384 |
| 1 | Hotels | 1398 |
| 1 | Signup?=121asd | 1413 |
| 1 | Signup?=121as123 | 1433 |
| 1 | homepage?c=1234 | 1447 |
| 1 | homepage?c=1234 | 1463 |
| 1 | homepage?c=1234 | 1482 |
| 1 | homepage?c=1234 | 1496 |
+------+------------------+-----------+
在上面 table 我想计算用户访问特定页面的次数。上面 table 中的每条记录都是命中,因此仅按列分组 page
无济于事。
我基本上只想计算特定页面的连续记录一次。我只想在用户访问其间的其他页面时增加特定页面的计数器。
如下图:
用户访问主页 3 次(黄色),注册页面 2 次(蓝色)和酒店页面 1 次(橙色)。
预期的输出如下:
+-------+----------+--------+--------+
| | Homepage | Signup | Hotels |
+-------+----------+--------+--------+
| users | 3 | 2 | 1 |
+-------+----------+--------+--------+
所以您要做的是首先从 url 中获取页面名称,因此主页?c=1234 成为主页。然后,对于每次访问,您可以使用 LAG
查看页面是否在两者之间发生变化,并使用 window SUM
函数来总结到目前为止的所有变化。然后每个页面访问组将获得一个唯一的分组编号,然后只需计算每个页面访问组的不同分组编号即可。
为了清晰和理解,我将其拆分为单独的 CTE:
WITH pagesplit AS
(
SELECT user
, (split(page, '?')[safe_ordinal(1)]) AS page
, ts
FROM so.visits
),
page_with_prev AS
(
SELECT user
, page
, COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page
, ts
FROM pagesplit
),
count_consecutive AS
(
SELECT user
, page
, SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no
FROM page_with_prev
)
SELECT user
, page
, COUNT(DISTINCT grouping_no) AS visit_count
FROM count_streaks
GROUP BY user, page
将结果旋转到列中可能没有多大意义,除非您确切知道您将拥有多少个并且永远不会添加新的。如果你愿意,当然可以这样做。
同时考虑以下选项
扁平化输出
select user, page, count(distinct visit_number) visits
from (
select *, countif(new_visit) over(partition by user order by timestamp) visit_number
from (
select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
from (
select user, split(page, '?')[offset(0)] page, timestamp
from your_table
)
)
)
group by user, page
结果
或使用 Pivot
select * from (
select user, page, countif(new_visit) over(partition by user order by timestamp) visit_number
from (
select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
from (
select user, split(page, '?')[offset(0)] page, timestamp
from your_table
)
)
)
pivot (count(distinct visit_number) for lower(page) in ('homepage', 'signup', 'hotels'))
有输出