计算用户在 SQL 内访问页面的次数

Calculate number of times a user visited a page in SQL

我有 table 如下所示。它有 3 列 userpagetimestamp

+------+------------------+-----------+
| user |       page       | timestamp |
+------+------------------+-----------+
|    1 | homepage?c=1234  |      1234 |
|    1 | homepage?c=1234  |      1245 |
|    1 | homepage?c=1234  |      1260 |
|    1 | homepage?c=1234  |      1280 |
|    1 | Signup?=1233     |      1293 |
|    1 | Signup?=121asd   |      1303 |
|    1 | Signup?=212      |      1317 |
|    1 | Signup?123213    |      1337 |
|    1 | homepage?c=1234  |      1357 |
|    1 | Hotels           |      1370 |
|    1 | Hotels           |      1384 |
|    1 | Hotels           |      1398 |
|    1 | Signup?=121asd   |      1413 |
|    1 | Signup?=121as123 |      1433 |
|    1 | homepage?c=1234  |      1447 |
|    1 | homepage?c=1234  |      1463 |
|    1 | homepage?c=1234  |      1482 |
|    1 | homepage?c=1234  |      1496 |
+------+------------------+-----------+ 

在上面 table 我想计算用户访问特定页面的次数。上面 table 中的每条记录都是命中,因此仅按列分组 page 无济于事。

我基本上只想计算特定页面的连续记录一次。我只想在用户访问其间的其他页面时增加特定页面的计数器。

如下图:

用户访问主页 3 次(黄色),注册页面 2 次(蓝色)和酒店页面 1 次(橙色)。

预期的输出如下:

+-------+----------+--------+--------+
|       | Homepage | Signup | Hotels |
+-------+----------+--------+--------+
| users |        3 |      2 |      1 |
+-------+----------+--------+--------+

所以您要做的是首先从 url 中获取页面名称,因此主页?c=1234 成为主页。然后,对于每次访问,您可以使用 LAG 查看页面是否在两者之间发生变化,并使用 window SUM 函数来总结到目前为止的所有变化。然后每个页面访问组将获得一个唯一的分组编号,然后只需计算每个页面访问组的不同分组编号即可。

为了清晰和理解,我将其拆分为单独的 CTE:

WITH pagesplit AS
(
    SELECT user
         , (split(page, '?')[safe_ordinal(1)]) AS page
         , ts 
      FROM so.visits
),
page_with_prev AS
(
    SELECT user
         , page
         , COALESCE(LAG(_page) OVER (PARTITION BY user ORDER BY _ts), '?') AS prev_page
         , ts
      FROM pagesplit
),
count_consecutive AS
(
    SELECT user
         , page
         , SUM(CASE WHEN page != prev_page THEN 1 ELSE NULL END) OVER (PARTITION BY user ORDER BY ts) AS grouping_no
      FROM page_with_prev
)
SELECT user
     , page
     , COUNT(DISTINCT grouping_no) AS visit_count
  FROM count_streaks
 GROUP BY user, page

将结果旋转到列中可能没有多大意义,除非您确切知道您将拥有多少个并且永远不会添加新的。如果你愿意,当然可以这样做。

同时考虑以下选项

扁平化输出

select user, page, count(distinct visit_number) visits
from (
  select *, countif(new_visit) over(partition by user order by timestamp) visit_number
  from (
    select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
    from (
      select user, split(page, '?')[offset(0)] page, timestamp
      from your_table
    )
  )
)
group by user, page         

结果

或使用 Pivot

select * from (
  select user, page, countif(new_visit) over(partition by user order by timestamp) visit_number
  from (
    select *, page != lag(page, 1, '') over(partition by user order by timestamp) new_visit
    from (
      select user, split(page, '?')[offset(0)] page, timestamp
      from your_table
    )
  )
)
pivot (count(distinct visit_number) for lower(page) in ('homepage', 'signup', 'hotels'))

有输出