SQL 基于三个参数查找唯一行的查询 - 有点 "get first row in sorted grouped set"
SQL query for finding unique rows based on three parameters - kind of "get first row in sorted grouped set"
我正在尝试查看是否有一种方法可以使用 SQL 根据三个参数找到唯一的分组行。它有点像获取特殊排序集中每个 group-by
键的第一行。
注意:我卡在 mysql 5.7.
这是我的测试table和数据:
CREATE TABLE observations (
id int(10) AUTO_INCREMENT,
area_code varchar(5),
observation_date timestamp,
reading int(10),
source varchar(10),
deleted_at timestamp NULL DEFAULT NULL,
PRIMARY KEY (id)
);
INSERT INTO observations (area_code,observation_date, reading, source, deleted_at)
VALUES
('test1', '2021-01-01', 7, 'auto', null),
('test1', '2021-01-02', 6, 'auto', null),
('test1', '2021-01-03', 5, 'auto', null),
('test2', '2021-01-01', 7, 'auto', null),
('test2', '2021-01-02', 6, 'manual', null),
('test2', '2021-01-03', 5, 'auto', null),
('test3', '2021-01-01', 7, 'auto', null),
('test3', '2021-01-02', 6, 'manual', '2021-01-02'),
('test3', '2021-01-03', 5, 'auto', null);
source
是 auto
或 manual
有多个区域 - 对于每个区域,我都想获得基于 observation_date
的最新阅读,但前提是 source
是 auto
。如果 source
是 manual
那么这将具有优先权 - 并且应该始终作为该区域的读数返回。但是,如果设置了 deleted_at
(仅适用于 manual
),则应忽略 manual
source
- 并且 observation_date
再次成为主要标准。
所以这三个参数是:observation_date
、source
和deleted_at
- 为了保存历史,一切都被保留。
这是我当前查询的实际输出,然后是预期输出:
当前查询尝试:
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE obs2.id IS NULL
实际输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
实际输出(删除了 AND obs1.deleted_at IS NULL
):
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-02 00:00:00 6 manual 2021-01-02 00:00:00
预期输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-03 00:00:00 5 auto NULL
我尝试了多种查询变体 - 但 none 提供了预期的结果。
这有可能吗?还是我做错了?
一切皆有可能
让我们根据您给出的逻辑对行进行编号:
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
然后只取 rown=1 的行:
WITH cte AS(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
)
SELECT * FROM cte WHERE rown = 1
行号根据 PARTITION BY 中指定的列的唯一组合将结果集分成组,然后按照 ORDER BY 中设置的排序子句的顺序分配一个递增的数字。
上面的逻辑将您所有的手动观测值排序为前导 (0) 并将删除的手动观测值排序为尾随 (2),自动位置为 1,然后如果是倍数,则降序的观测值日期(最新)用作决胜局申请
首先,预期结果应该包含id 9,而不是你指定的id 8,因为id 8是手动的,已经被删除了。
所以预期的结果是
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
9 test3 2021-01-03 00:00:00 5 auto NULL
如果你 运行 它没有 WHERE 条件并且 SELECT obs2.* 行
SELECT obs1.*, obs2.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE 1 OR obs2.id IS NULL
你会看到结果包含
9 test3 2021-01-03T00:00:00Z 5 auto (null) 8 test3 2021-01-02T00:00:00Z 6 manual 2021-01-02T00:00:00Z
所以问题是你没有考虑obs2.source = 'manual'
。
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual" AND obs1.deleted_at IS NULL) OR
(obs2.source = 'manual' AND obs2.deleted_at IS NOT NULL) OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto")
)
WHERE obs2.id IS NULL
这是您在旧版本 MySQL 中对相关子查询执行的操作类型:
select o.*
from observations o
where o.id = (select o2.id
from observations o2
where o2.area_code = o.area_code and
o2.deleted_at is null
order by (o2.source = 'manual') desc,
o2.observation_date desc
limit 1
);
我正在尝试查看是否有一种方法可以使用 SQL 根据三个参数找到唯一的分组行。它有点像获取特殊排序集中每个 group-by
键的第一行。
注意:我卡在 mysql 5.7.
这是我的测试table和数据:
CREATE TABLE observations (
id int(10) AUTO_INCREMENT,
area_code varchar(5),
observation_date timestamp,
reading int(10),
source varchar(10),
deleted_at timestamp NULL DEFAULT NULL,
PRIMARY KEY (id)
);
INSERT INTO observations (area_code,observation_date, reading, source, deleted_at)
VALUES
('test1', '2021-01-01', 7, 'auto', null),
('test1', '2021-01-02', 6, 'auto', null),
('test1', '2021-01-03', 5, 'auto', null),
('test2', '2021-01-01', 7, 'auto', null),
('test2', '2021-01-02', 6, 'manual', null),
('test2', '2021-01-03', 5, 'auto', null),
('test3', '2021-01-01', 7, 'auto', null),
('test3', '2021-01-02', 6, 'manual', '2021-01-02'),
('test3', '2021-01-03', 5, 'auto', null);
source
是 auto
或 manual
有多个区域 - 对于每个区域,我都想获得基于 observation_date
的最新阅读,但前提是 source
是 auto
。如果 source
是 manual
那么这将具有优先权 - 并且应该始终作为该区域的读数返回。但是,如果设置了 deleted_at
(仅适用于 manual
),则应忽略 manual
source
- 并且 observation_date
再次成为主要标准。
所以这三个参数是:observation_date
、source
和deleted_at
- 为了保存历史,一切都被保留。
这是我当前查询的实际输出,然后是预期输出:
当前查询尝试:
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE obs2.id IS NULL
实际输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
实际输出(删除了 AND obs1.deleted_at IS NULL
):
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-02 00:00:00 6 manual 2021-01-02 00:00:00
预期输出:
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
8 test3 2021-01-03 00:00:00 5 auto NULL
我尝试了多种查询变体 - 但 none 提供了预期的结果。
这有可能吗?还是我做错了?
一切皆有可能
让我们根据您给出的逻辑对行进行编号:
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
然后只取 rown=1 的行:
WITH cte AS(
SELECT *,
ROW_NUMBER() OVER(PARTITION BY area_code ORDER BY
CASE
WHEN source = 'manual' and deleted_at IS NULL THEN 0 --priority
WHEN source = 'manual' and deleted_at IS NOT NULL THEN 2 --not priority
ELSE 1 --auto
END,
observation_date DESC
) as rown
FROM
obervations
)
SELECT * FROM cte WHERE rown = 1
行号根据 PARTITION BY 中指定的列的唯一组合将结果集分成组,然后按照 ORDER BY 中设置的排序子句的顺序分配一个递增的数字。
上面的逻辑将您所有的手动观测值排序为前导 (0) 并将删除的手动观测值排序为尾随 (2),自动位置为 1,然后如果是倍数,则降序的观测值日期(最新)用作决胜局申请
首先,预期结果应该包含id 9,而不是你指定的id 8,因为id 8是手动的,已经被删除了。 所以预期的结果是
id area_code observation_date reading source deleted_at
3 test1 2021-01-03 00:00:00 5 auto NULL
5 test2 2021-01-02 00:00:00 6 manual NULL
9 test3 2021-01-03 00:00:00 5 auto NULL
如果你 运行 它没有 WHERE 条件并且 SELECT obs2.* 行
SELECT obs1.*, obs2.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual"
AND obs1.deleted_at IS NULL
)
OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto" )
)
WHERE 1 OR obs2.id IS NULL
你会看到结果包含
9 test3 2021-01-03T00:00:00Z 5 auto (null) 8 test3 2021-01-02T00:00:00Z 6 manual 2021-01-02T00:00:00Z
所以问题是你没有考虑obs2.source = 'manual'
。
SELECT obs1.*
FROM observations AS obs1
LEFT JOIN observations AS obs2 ON
obs1.area_code = obs2.area_code AND
obs1.id != obs2.id AND
NOT (
(obs1.source = "manual" AND obs1.deleted_at IS NULL) OR
(obs2.source = 'manual' AND obs2.deleted_at IS NOT NULL) OR
(obs1.observation_date > obs2.observation_date AND obs2.source = "auto")
)
WHERE obs2.id IS NULL
这是您在旧版本 MySQL 中对相关子查询执行的操作类型:
select o.*
from observations o
where o.id = (select o2.id
from observations o2
where o2.area_code = o.area_code and
o2.deleted_at is null
order by (o2.source = 'manual') desc,
o2.observation_date desc
limit 1
);