SQL获取群号
SQL to get group number
我有一个示例数据如下。
+---------+------------+--------+-----------+
| User Id | Sequence | Action | Object |
|---------|------------|--------|-----------|
| 12345 | 1 | Eat | Bread |
| 12345 | 2 | Eat | Steak |
| 12345 | 3 | Eat | Bread |
| 12345 | 4 | Drink | Milk tea |
| 12345 | 5 | Drink | Black tea |
| 12345 | 6 | Eat | Cake |
| 12345 | 7 | Eat | Candy |
| 12345 | 8 | Drink | Black tea |
| 12345 | 9 | Drink | Green tea |
| 12345 | 10 | Drink | Water |
+---------+------------+--------+-----------+
现在我想在 table 中添加一个名为 'Group Id' 的列,结果应该是这样的:
+---------+------------+--------+-----------+-----------+
| User Id | Sequence | Action | Object | Group Id. |
|---------|------------|--------|-----------|-----------|
| 12345 | 1 | Eat | Bread | 1 |
| 12345 | 2 | Eat | Steak | 1 |
| 12345 | 3 | Eat | Bread | 1 |
| 12345 | 4 | Drink | Milk tea | 2 |
| 12345 | 5 | Drink | Black tea | 2 |
| 12345 | 6 | Eat | Cake | 3 |
| 12345 | 7 | Eat | Candy | 3 |
| 12345 | 8 | Drink | Black tea | 4 |
| 12345 | 9 | Drink | Green tea | 4 |
| 12345 | 10 | Drink | Water | 4 |
+---------+------------+--------+-----------+-----------|
同一个动作应该分成一组,但会以不同的顺序分开。
如何实现 SQL(我使用 Google Bigquery)?
万分感谢!
这是一种间隙和孤岛问题。一个简单的方法是使用 lag()
来确定发生变化的位置而不是累积和:
select t.*,
1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
lag(action) over (order by sequence) as prev_action
from t
) t;
您还可以使用 countif()
:
来表达外部逻辑
1 + countif( prev_action <> acction ) over (order by sequence) as group_id
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT * EXCEPT(new_group),
COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
SELECT *,
Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
FROM `project.dataset.table`
)
-- ORDER BY User_Id
如果应用于您问题中的示例数据 - 输出为
Row User_Id Sequence Action Object Group_Id
1 12345 1 Eat Bread 1
2 12345 2 Eat Steak 1
3 12345 3 Eat Bread 1
4 12345 4 Drink Milk tea 2
5 12345 5 Drink Black tea 2
6 12345 6 Eat Cake 3
7 12345 7 Eat Candy 3
8 12345 8 Drink Black tea 4
9 12345 9 Drink Green tea 4
10 12345 10 Drink Water 4
我有一个示例数据如下。
+---------+------------+--------+-----------+
| User Id | Sequence | Action | Object |
|---------|------------|--------|-----------|
| 12345 | 1 | Eat | Bread |
| 12345 | 2 | Eat | Steak |
| 12345 | 3 | Eat | Bread |
| 12345 | 4 | Drink | Milk tea |
| 12345 | 5 | Drink | Black tea |
| 12345 | 6 | Eat | Cake |
| 12345 | 7 | Eat | Candy |
| 12345 | 8 | Drink | Black tea |
| 12345 | 9 | Drink | Green tea |
| 12345 | 10 | Drink | Water |
+---------+------------+--------+-----------+
现在我想在 table 中添加一个名为 'Group Id' 的列,结果应该是这样的:
+---------+------------+--------+-----------+-----------+
| User Id | Sequence | Action | Object | Group Id. |
|---------|------------|--------|-----------|-----------|
| 12345 | 1 | Eat | Bread | 1 |
| 12345 | 2 | Eat | Steak | 1 |
| 12345 | 3 | Eat | Bread | 1 |
| 12345 | 4 | Drink | Milk tea | 2 |
| 12345 | 5 | Drink | Black tea | 2 |
| 12345 | 6 | Eat | Cake | 3 |
| 12345 | 7 | Eat | Candy | 3 |
| 12345 | 8 | Drink | Black tea | 4 |
| 12345 | 9 | Drink | Green tea | 4 |
| 12345 | 10 | Drink | Water | 4 |
+---------+------------+--------+-----------+-----------|
同一个动作应该分成一组,但会以不同的顺序分开。 如何实现 SQL(我使用 Google Bigquery)?
万分感谢!
这是一种间隙和孤岛问题。一个简单的方法是使用 lag()
来确定发生变化的位置而不是累积和:
select t.*,
1 + sum( case when prev_action = action then 0 else 1 end ) over (order by sequence) as group_id
from (select t.*,
lag(action) over (order by sequence) as prev_action
from t
) t;
您还可以使用 countif()
:
1 + countif( prev_action <> acction ) over (order by sequence) as group_id
以下适用于 BigQuery 标准 SQL
#standardSQL
SELECT * EXCEPT(new_group),
COUNTIF(new_group) OVER(PARTITION BY User_Id ORDER BY Sequence) Group_Id
FROM (
SELECT *,
Action != LAG(Action, 1, '') OVER(PARTITION BY User_Id ORDER BY Sequence) new_group
FROM `project.dataset.table`
)
-- ORDER BY User_Id
如果应用于您问题中的示例数据 - 输出为
Row User_Id Sequence Action Object Group_Id
1 12345 1 Eat Bread 1
2 12345 2 Eat Steak 1
3 12345 3 Eat Bread 1
4 12345 4 Drink Milk tea 2
5 12345 5 Drink Black tea 2
6 12345 6 Eat Cake 3
7 12345 7 Eat Candy 3
8 12345 8 Drink Black tea 4
9 12345 9 Drink Green tea 4
10 12345 10 Drink Water 4