SQL QUERY 在 Table 排序后合并连续的相同值
SQL QUERY Merge Consecutive Same Values After Sorting the Table
假设我们有这个 table
id |begin | end | location
1 | 5 | 10 | MALL A
2 | 1 | 3 | MALL B
3 | 13 | 17 | MALL A
4 | 21 | 25 | MALL C
5 | 36 | 38 | MALL D
6 | 31 | 33 | MALL D
7 | 26 | 29 | MALL F
8 | 40 | 45 | MALL D
然后我们按开始列 asc 对 table 进行排序。因此我们有这个 table
id |begin | end | location
2 | 1 | 3 | MALL B
1 | 5 | 10 | MALL A
3 | 13 | 17 | MALL A
4 | 21 | 25 | MALL C
7 | 26 | 29 | MALL F
6 | 31 | 33 | MALL D
5 | 36 | 38 | MALL D
8 | 40 | 45 | MALL D
我想要一个这样的 table。 (连续位置相同的行将被合并)
begin | end | location
1 | 3 | MALL B
5 | 17 | MALL A
21 | 25 | MALL C
26 | 29 | MALL F
31 | 45 | MALL D
如何实现?
我认为我可以使用 RANK() 然后按排名值对其进行分组。但我做不到。我认为这是因为 table 没有首先排序。
如果您想在 SQL 上创建 table,我提供了这些 SQL 语法来创建它。
CREATE TABLE `t` (
`id` int NOT NULL,
`begin` int DEFAULT NULL,
`end` int DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
INSERT INTO t (id, begin, end, location) VALUES (1, 5,10, 'A');
INSERT INTO t (id, begin, end, location) VALUES (2, 1,3, 'B');
INSERT INTO t (id, begin, end, location) VALUES (3, 13,17, 'A');
INSERT INTO t (id, begin, end, location) VALUES (4, 21,25, 'C');
INSERT INTO t (id, begin, end, location) VALUES (5, 36,38, 'D');
INSERT INTO t (id, begin, end, location) VALUES (6, 31,33, 'D');
INSERT INTO t (id, begin, end, location) VALUES (7, 26,29, 'F');
INSERT INTO t (id, begin, end, location) VALUES (8, 40,45, 'D');
这是一种间隙和孤岛问题。在这种情况下,您可以使用 lag()
来确定行在单独组中的位置。然后使用累计和来定义组和聚合:
select location, min(begin), max(end)
from (select t.*,
sum(case when prev_location = location then 0 else 1 end) over (order by begin) as grp
from (select t.*,
lag(location) over (order by begin) as prev_location
from t
) t
) t
group by grp, location;
实际上,因为你不关心两端之间的空隙和下面的开始,你可以使用更简单的行号差异:
select location, min(begin), max(end)
from (select t.*,
row_number() over (order by begin) as seqnum,
row_number() over (partition by location order by begin) as seqnum_2
from t
) t
group by location, (seqnum - seqnum_2);
这个有点难解释,但是如果你看一下子查询的结果,你会发现两个 row_number()
之间的差异是如何在位置相同的情况下保持不变的。
假设我们有这个 table
id |begin | end | location
1 | 5 | 10 | MALL A
2 | 1 | 3 | MALL B
3 | 13 | 17 | MALL A
4 | 21 | 25 | MALL C
5 | 36 | 38 | MALL D
6 | 31 | 33 | MALL D
7 | 26 | 29 | MALL F
8 | 40 | 45 | MALL D
然后我们按开始列 asc 对 table 进行排序。因此我们有这个 table
id |begin | end | location
2 | 1 | 3 | MALL B
1 | 5 | 10 | MALL A
3 | 13 | 17 | MALL A
4 | 21 | 25 | MALL C
7 | 26 | 29 | MALL F
6 | 31 | 33 | MALL D
5 | 36 | 38 | MALL D
8 | 40 | 45 | MALL D
我想要一个这样的 table。 (连续位置相同的行将被合并)
begin | end | location
1 | 3 | MALL B
5 | 17 | MALL A
21 | 25 | MALL C
26 | 29 | MALL F
31 | 45 | MALL D
如何实现?
我认为我可以使用 RANK() 然后按排名值对其进行分组。但我做不到。我认为这是因为 table 没有首先排序。
如果您想在 SQL 上创建 table,我提供了这些 SQL 语法来创建它。
CREATE TABLE `t` (
`id` int NOT NULL,
`begin` int DEFAULT NULL,
`end` int DEFAULT NULL,
`location` varchar(45) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci
INSERT INTO t (id, begin, end, location) VALUES (1, 5,10, 'A');
INSERT INTO t (id, begin, end, location) VALUES (2, 1,3, 'B');
INSERT INTO t (id, begin, end, location) VALUES (3, 13,17, 'A');
INSERT INTO t (id, begin, end, location) VALUES (4, 21,25, 'C');
INSERT INTO t (id, begin, end, location) VALUES (5, 36,38, 'D');
INSERT INTO t (id, begin, end, location) VALUES (6, 31,33, 'D');
INSERT INTO t (id, begin, end, location) VALUES (7, 26,29, 'F');
INSERT INTO t (id, begin, end, location) VALUES (8, 40,45, 'D');
这是一种间隙和孤岛问题。在这种情况下,您可以使用 lag()
来确定行在单独组中的位置。然后使用累计和来定义组和聚合:
select location, min(begin), max(end)
from (select t.*,
sum(case when prev_location = location then 0 else 1 end) over (order by begin) as grp
from (select t.*,
lag(location) over (order by begin) as prev_location
from t
) t
) t
group by grp, location;
实际上,因为你不关心两端之间的空隙和下面的开始,你可以使用更简单的行号差异:
select location, min(begin), max(end)
from (select t.*,
row_number() over (order by begin) as seqnum,
row_number() over (partition by location order by begin) as seqnum_2
from t
) t
group by location, (seqnum - seqnum_2);
这个有点难解释,但是如果你看一下子查询的结果,你会发现两个 row_number()
之间的差异是如何在位置相同的情况下保持不变的。