SQL获取群号

SQL to get group number

我有一个示例数据如下。

    +---------+------------+--------+-----------+
    | User Id |   Sequence | Action | Object    |
    |---------|------------|--------|-----------|
    | 12345   |    1       | Eat    | Bread     |
    | 12345   |    2       | Eat    | Steak     |
    | 12345   |    3       | Eat    | Bread     |
    | 12345   |    4       | Drink  | Milk tea  |
    | 12345   |    5       | Drink  | Black tea |  
    | 12345   |    6       | Eat    | Cake      |
    | 12345   |    7       | Eat    | Candy     |
    | 12345   |    8       | Drink  | Black tea | 
    | 12345   |    9       | Drink  | Green tea | 
    | 12345   |    10      | Drink  | Water     |
    +---------+------------+--------+-----------+

现在我想在 table 中添加一个名为 'Group Id' 的列,结果应该是这样的:

    +---------+------------+--------+-----------+-----------+
    | User Id |   Sequence | Action | Object    | Group Id. |
    |---------|------------|--------|-----------|-----------|
    | 12345   |    1       | Eat    | Bread     |     1     |
    | 12345   |    2       | Eat    | Steak     |     1     |
    | 12345   |    3       | Eat    | Bread     |     1     |
    | 12345   |    4       | Drink  | Milk tea  |     2     |
    | 12345   |    5       | Drink  | Black tea |     2     |
    | 12345   |    6       | Eat    | Cake      |     3     |
    | 12345   |    7       | Eat    | Candy     |     3     |
    | 12345   |    8       | Drink  | Black tea |     4     |
    | 12345   |    9       | Drink  | Green tea |     4     |
    | 12345   |    10      | Drink  | Water     |     4     |
    +---------+------------+--------+-----------+-----------|

同一个动作应该分成一组,但会以不同的顺序分开。 如何实现 SQL(我使用 Google Bigquery)?

万分感谢!

这是一种间隙和孤岛问题。一个简单的方法是使用 lag() 来确定发生变化的位置而不是累积和:

select t.*,
       1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
             lag(action) over (order by sequence) as prev_action
      from t
     ) t;

您还可以使用 countif():

来表达外部逻辑
1 + countif( prev_action <> acction ) over (order by sequence) as group_id

以下适用于 BigQuery 标准 SQL

#standardSQL
SELECT * EXCEPT(new_group),
  COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
  SELECT *,
    Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
  FROM `project.dataset.table`
)
-- ORDER BY User_Id     

如果应用于您问题中的示例数据 - 输出为

Row User_Id Sequence    Action  Object      Group_Id     
1   12345   1           Eat     Bread       1    
2   12345   2           Eat     Steak       1    
3   12345   3           Eat     Bread       1    
4   12345   4           Drink   Milk tea    2    
5   12345   5           Drink   Black tea   2    
6   12345   6           Eat     Cake        3    
7   12345   7           Eat     Candy       3    
8   12345   8           Drink   Black tea   4    
9   12345   9           Drink   Green tea   4    
10  12345   10          Drink   Water       4