SQL 获取一些组并保持顺序
SQL to get some groups and keep the order
我有一个示例数据如下。
+---------+------------+--------+
| user id | sequence | Action |
|---------|------------|--------|
| 12345 | 1 | Run |
| 12345 | 2 | Sit |
| 12345 | 3 | Sit |
| 12345 | 4 | Run |
| 12345 | 5 | Run |
| 12345 | 6 | Sit |
+---------+------------+--------+
现在我希望结果应该是这样的:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
应合并序列#2 和#3 的行,#4 和#5 应合并。
我使用 'group by Action' 会得到类似下面的答案 table,但这不是我想要的:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
如何实现 SQL(我使用 Google Bigquery)?
万分感谢!
您可以使用 window 函数:想法是将每一行的操作与“上一个”操作进行比较,并过滤值发生变化的行:
select *
from (
select t.*, lag(action) over(partition by user_id order by sequence) lag_action
from mytable t
) t
where action <> lag_action or lag_action is null
下面是 BigQuery 标准 SQL
#standardSQL
SELECT * EXCEPT(dup) FROM (
SELECT *, action = LAG(action, 1, '') OVER(PARTITION BY user_id ORDER BY sequence) AS dup
FROM `project.dataset.table`
)
WHERE NOT dup
如果应用于您问题中的示例数据 - 输出为
Row user_id sequence action
1 12345 1 Run
2 12345 2 Sit
3 12345 4 Run
4 12345 6 Sit
我有一个示例数据如下。
+---------+------------+--------+
| user id | sequence | Action |
|---------|------------|--------|
| 12345 | 1 | Run |
| 12345 | 2 | Sit |
| 12345 | 3 | Sit |
| 12345 | 4 | Run |
| 12345 | 5 | Run |
| 12345 | 6 | Sit |
+---------+------------+--------+
现在我希望结果应该是这样的:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
应合并序列#2 和#3 的行,#4 和#5 应合并。 我使用 'group by Action' 会得到类似下面的答案 table,但这不是我想要的:
+---------+---------+
| user id | Action |
|---------|---------|
| 12345 | Run |
| 12345 | Sit |
+---------+---------+
如何实现 SQL(我使用 Google Bigquery)?
万分感谢!
您可以使用 window 函数:想法是将每一行的操作与“上一个”操作进行比较,并过滤值发生变化的行:
select *
from (
select t.*, lag(action) over(partition by user_id order by sequence) lag_action
from mytable t
) t
where action <> lag_action or lag_action is null
下面是 BigQuery 标准 SQL
#standardSQL
SELECT * EXCEPT(dup) FROM (
SELECT *, action = LAG(action, 1, '') OVER(PARTITION BY user_id ORDER BY sequence) AS dup
FROM `project.dataset.table`
)
WHERE NOT dup
如果应用于您问题中的示例数据 - 输出为
Row user_id sequence action
1 12345 1 Run
2 12345 2 Sit
3 12345 4 Run
4 12345 6 Sit