如何计算 sql 中满足特定条件的循环数

how to count the number of loops that meet specific criteria in sql

在MySQL数据库上,我有下面的table(这是客户的面板数据)

 
user |  tab   |   action    |    time| 

77      -          login      1407171344
77    user-info    view       1407171400
77    traffic      select     1407171407
77      -          login      1407171440
65      -          login      1407171505
65    change       select     1407564830
65    change       pay        1407579352
65      -          login      1407579442
65      -          login      1407579765
77      -          login      1407579866
77      -          login      1407680000
77    promotion    bank       1407171400
77    promotion    pay        1408100946
65    traffic      select     1407171400
65    traffic      pay        1408114734
65      -          login      1408125796
65    service      extend     1408192741

我有很多行具有不同的客户 ID。我想计算每个客户的活动会话数。也就是我想统计一个客户登录的次数,登录后又做了一个action/actions。因此,中间没有任何操作的两次连续登录不算作一次会话。会话结束可以由下一次登录代理。对于用户 77,前三行 (action:login,select,view) 包含一个会话,但下一次登录不会,因为没有采取其他操作。因此,在上面的 table 中,用户 77 有两个活动会话,用户 75 有 3 个活动会话。

活动会话如下:(重复登录不会删除任何操作)

user |  tab   |   action    |    time| 

77      -          login      1407171344
77    user-info    view       1407171400
77    traffic      select     1407171407
65      -          login      1407171505
65    change       select     1407564830
65    change       pay        1407579352
65      -          login      1407579765
77      -          login      1407680000
77    promotion    bank       1407171400
77    promotion    pay        1408100946
65    traffic      select     1407171400
65    traffic      pay        1408114734
65      -          login      1408125796
65    service      extend     1408192741

如何计算活动会话数?提前致谢。

P.S。我试过在 R 中导入数据,但它是一个大数据,而且 R 似乎在循环中非常慢。所以我尽量坚持SQL。

假设用户不能同时进行多个会话, 如果他们这样做,那么您需要使用第三个参数以不同方式跟踪它们。 假设您的数据已经在 table user_action 中并且目前 它看起来像:

SELECT  user,action,time  FROM user_action  order by user, time;
user    activity    time
65  select      1407171400
65  login       1407171505
65  select      1407564830
65  pay         1407579352
65  login       1407579442
65  login       1407579765
65  pay         1408114734
65  login       1408125796
65  extend      1408192741
77  login       1407171344
77  bank        1407171400
77  view        1407171400
77  select      1407171407
77  login       1407171440
77  login       1407579866
77  login       1407680000
77  pay         1408100946

按用户顺序复制记录,然后按时间顺序复制记录 , 使用新列 activity 编号 --> 进入新温度 table 进行分析

每个客户的最后一个会话可能没有登录记录来表示会话结束,因此我们为每个客户添加一个登录行以结束他们的会话。

DROP TABLE IF EXISTS user_action_temp;
SET @activity_number := 0;
CREATE TABLE  user_action_temp
AS 
SELECT @activity_number := @activity_number + 1 AS activity_number, user, action, time
FROM
(SELECT  user,action,time  FROM user_action 
UNION SELECT user,'login' as action,max(time)+1 as time  FROM user_action GROUP BY user) AS USER_ACTIVITY
ORDER BY user, time;

您的数据现在看起来像:

select * From user_action_temp order by user, time;
activity_number user    action      time
1       65  select      1407171400
2       65  login       1407171505
3       65  select      1407564830
4       65  pay         1407579352
5       65  login       1407579442
6       65  login       1407579765
7       65  pay         1408114734
8       65  login       1408125796
9       65  extend      1408192741
10      65  login       1408192742
11      77  login       1407171344
12      77  bank        1407171400
13      77  view        1407171400
14      77  select      1407171407
15      77  login       1407171440
16      77  login       1407579866
17      77  login       1407680000
18      77  pay         1408100946
19      77  login       1408100947

接下来,自己加入这个table 让我们定义两个变量来为每个登录 activity 设置登录号。

SET @login_number1:=0;
SET @login_number2:=0;

自加入 table 登录号 table 1 与 table 2 中的下一个登录匹配,用户保持不变。 Activity 计数是两次登录之间的总活动

SELECT * FROM
(
SELECT logins_1.user,
logins_1.time as session_start,
logins_2.time as session_end, 

case when (logins_2.activity_number -logins_1.activity_number )>1 
then (logins_2.activity_number -logins_1.activity_number - 1) else 0 end  
as activity_count

FROM
    (SELECT @login_number1 := @login_number1 + 1 AS login_number, 
    activity_number, user, action, time 
    FROM user_action_temp 
    WHERE action='login' 
    ORDER BY user, time) AS logins_1
LEFT OUTER JOIN
    (SELECT @login_number2 := @login_number2 + 1 AS login_number2,
     activity_number, user, action, time  
    FROM user_action_temp 
    WHERE action='login' 
    ORDER BY user, time) AS logins_2
on logins_1.login_number = (logins_2.login_number2-1) 
and logins_1.user = logins_2.user

) AS RESULT;

其中提供了所有用户会话的摘要:

user    session_start   session_end activity_count
65  1407171505  1407579442  2
65  1407579442  1407579765  0
65  1407579765  1408125796  1
65  1408125796  1408192742  1
65  1408192742  <null>  0
77  1407171344  1407171440  3
77  1407171440  1407579866  0
77  1407579866  1407680000  0
77  1407680000  1408100947  1
77  1408100947  <null>  0

您可以使用 WHERE activity_count>0 过滤上述查询以获得您想要的内容。