查找月度缺失的数据
find data missing on monthly base
有了这个数据 table “table”:
和众所周知的 newID 值 (1,2,3)
的 table“ID”
我想找出所有缺少某些 newID 值的月份。结果应仅包括至少有 1 newId
.
的月份
这是上面列出的数据的预期结果:
我怎样才能完成这个任务?
Table 架构:(fiddle) http://sqlfiddle.com/#!9/72af42
CREATE TABLE `ID`(
Id int unsigned not null primary key
);
INSERT INTO `ID` (Id) VALUES (1),(2),(3);
CREATE TABLE `table`(
recId int unsigned not null auto_increment primary key,
Data DateTime not null,
newID int not null
);
INSERT INTO `table` (`Data`,`newID`) VALUES
('2017-12-06',1),
('2017-12-06',3),
('2017-11-16',1),
('2017-11-16',2),
('2017-11-16',3),
('2017-10-05',2),
('2017-10-05',3),
('2017-10-03',2),
('2017-10-03',3),
('2017-08-16',1),
('2017-08-16',2),
('2017-08-16',3),
('2017-05-05',1),
('2017-05-05',2),
('2017-05-05',3);
因为您只需要缺失的 ID
具有相关 Month
的记录,这是一个简单的 非相关 子查询场景。
In SQL a Non-Correlated sub-query can be modeled either using a NOT EXISTS
expression in the WHERE
clause, or you can use a LEFT OUTER JOIN
and only include the NULL results for the columns in the joined table, as this would indicate the records where no match was found
此查询仅因您要计算日期的 MONTH
部分而不是显式日期值这一事实而变得复杂。 SQL 提供了所有必要的工具,我们甚至可以格式化您想要的输出:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
LEFT OUTER JOIN `table` t ON YEAR(t.Data) = m.year AND MONTH(t.Data) = m.month AND t.newId = i.Id
WHERE t.Data IS NULL
ORDER BY m.Year DESC, m.month DESC
看看这个fiddle:http://sqlfiddle.com/#!9/72af42/2
Choosing between WHERE NOT EXISTS
AND LEFT OUTER JOIN
can affect performance slightly, but the affect will depend on your query, your RDBMS and the available indexes. I personally use the JOIN
syntax first because IMO it is simpler to maintain, but you use your own discretion.
There is a lot of talk at least in MS SQL that NOT EXISTS
should be faster than JOIN
but if performance is an issue you for this specific query you should look at storing the year
and month
columns as persisted values so that they can be indexed and to reduce the function evaluatations.because it will evaluate less lookups.
- Best practice between using LEFT JOIN or NOT EXISTS
- SQL performance on LEFT OUTER JOIN vs NOT EXISTS
- MySQL: NOT EXISTS vs LEFT OUTER JOIN…IS NULL
为了进行比较,这是等效的 WHERE NOT EXISTS
查询:http://sqlfiddle.com/#!9/72af42/5
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
WHERE NOT EXISTS (
SELECT *
FROM `table` t
WHERE YEAR(t.Data) = m.year
AND MONTH(t.Data) = m.month
AND t.newId = i.Id
)
ORDER BY m.Year DESC, m.month DESC
如何使用持久值进行优化?
如果我们对YEAR()
和MONTH()
进行预运算,直接将结果存储在table中,那么查询速度会提高,但我们也可以加索引超充电。
Consider the over all PROs and CONs before going this far...
- Do you really need this level of optimisation?
- How often is the query going to be executed?
- Can you change the application logic to use a more appropriate
WHERE
clause to restrict the scope of the data instead?
物化视图
对此的一种解决方案是创建和管理实体化视图。这是一种 DW 技术,它可以有效地让您定义一个视图,但定期执行并存储到它自己的 table space.
A materialised View does not optimise your query, but it allows you to execute complex and long-running query once, so that the results can be queried directly like a normal table, without having to re-evaluate column expressions.
您的数据和查询类型看起来很适合物化视图,因为它查询的历史数据变化率为零或非常低,仅更新新行,我们可能不关心当月结果。在这种情况下,如果您最终 运行 查询多次,并且结果或多或少保持不变,那么为什么不 运行 查询作为一个过程,比如说每个月并将结果存储在一个专门构建的 table,那么您的应用程序可以经常查询 table 并获得闪电般快速的结果。
MySQL does not support Materialized Views, but you can replicate the concept as explained above in your application logic, some other RDBMS provide this OOTB, its the concept that should be considered.
计算列
您可以将额外的列添加到您的 table 并根据 user/application 逻辑维护这些列,但这不是很可靠,除非您信任您的应用程序开发人员并且该应用程序是唯一将更新此 table.
的进程
在这种情况下,计算列非常适合可靠性,但只有当您可以将值保存到列存储时,它们才能帮助我们提高性能。 (计算列的默认状态是表达式将在执行时计算,这对当前查询没有什么好处)
Again this is where MySQL will let you down, many other RDBMS offer simpler ways to do this, you need MySQL v5.7 for this to work
ALTER TABLE `table` ADD `year` GENERATED ALWAYS AS (YEAR(Data)) STORED;
ALTER TABLE `table` ADD `month` GENERATED ALWAYS AS (MONTH(Data)) STORED;
触发器
您的另一个选择是添加列,然后使用触发器来维护值,MySQL 并不容易,但它可以工作
将列添加到您的 table:
ALTER TABLE `table` ADD (`year` int NULL);
ALTER TABLE `table` ADD (`month` int NULL);
创建触发器来管理这些列中的值,以便用户无法覆盖它们:
DELIMITER $$
CREATE TRIGGER persist_index_values_insert
BEFORE INSERT ON `table` FOR EACH ROW
BEGIN
SET new.year= YEAR(new.Data);
SET new.month = MONTH(NEW.Data);
END$$
CREATE TRIGGER persist_index_values_update
BEFORE UPDATE ON `table` FOR EACH ROW
BEGIN
SET NEW.year = YEAR(NEW.Data);
SET NEW.month = MONTH(NEW.Data);
END$$
分隔符;
更简单的查询:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT `year`, `month`
FROM `table`
GROUP BY `year`, `month`
) m
CROSS JOIN `ID` i
LEFT OUTER JOIN `table` t ON t.year = m.year AND t.month = m.month AND t.newId = i.Id
WHERE t.Data IS NULL
ORDER BY m.Year DESC, m.month DESC
现在可以根据需要应用索引,您应该查阅查询执行计划以获取指导,但我建议您需要一个索引用于 year
和 month
和 newId
至少:
CREATE INDEX IX_TABLE_YEAR_MONTH_NEWID ON `table` (`year`,`month`,'newId');
有了这个数据 table “table”:
和众所周知的 newID 值 (1,2,3)
的 table“ID”我想找出所有缺少某些 newID 值的月份。结果应仅包括至少有 1 newId
.
这是上面列出的数据的预期结果:
我怎样才能完成这个任务?
Table 架构:(fiddle) http://sqlfiddle.com/#!9/72af42
CREATE TABLE `ID`(
Id int unsigned not null primary key
);
INSERT INTO `ID` (Id) VALUES (1),(2),(3);
CREATE TABLE `table`(
recId int unsigned not null auto_increment primary key,
Data DateTime not null,
newID int not null
);
INSERT INTO `table` (`Data`,`newID`) VALUES
('2017-12-06',1),
('2017-12-06',3),
('2017-11-16',1),
('2017-11-16',2),
('2017-11-16',3),
('2017-10-05',2),
('2017-10-05',3),
('2017-10-03',2),
('2017-10-03',3),
('2017-08-16',1),
('2017-08-16',2),
('2017-08-16',3),
('2017-05-05',1),
('2017-05-05',2),
('2017-05-05',3);
因为您只需要缺失的 ID
具有相关 Month
的记录,这是一个简单的 非相关 子查询场景。
In SQL a Non-Correlated sub-query can be modeled either using a
NOT EXISTS
expression in theWHERE
clause, or you can use aLEFT OUTER JOIN
and only include the NULL results for the columns in the joined table, as this would indicate the records where no match was found
此查询仅因您要计算日期的 MONTH
部分而不是显式日期值这一事实而变得复杂。 SQL 提供了所有必要的工具,我们甚至可以格式化您想要的输出:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
LEFT OUTER JOIN `table` t ON YEAR(t.Data) = m.year AND MONTH(t.Data) = m.month AND t.newId = i.Id
WHERE t.Data IS NULL
ORDER BY m.Year DESC, m.month DESC
看看这个fiddle:http://sqlfiddle.com/#!9/72af42/2
Choosing between
WHERE NOT EXISTS
ANDLEFT OUTER JOIN
can affect performance slightly, but the affect will depend on your query, your RDBMS and the available indexes. I personally use theJOIN
syntax first because IMO it is simpler to maintain, but you use your own discretion.There is a lot of talk at least in MS SQL that
NOT EXISTS
should be faster thanJOIN
but if performance is an issue you for this specific query you should look at storing theyear
andmonth
columns as persisted values so that they can be indexed and to reduce the function evaluatations.because it will evaluate less lookups.
- Best practice between using LEFT JOIN or NOT EXISTS
- SQL performance on LEFT OUTER JOIN vs NOT EXISTS
- MySQL: NOT EXISTS vs LEFT OUTER JOIN…IS NULL
为了进行比较,这是等效的 WHERE NOT EXISTS
查询:http://sqlfiddle.com/#!9/72af42/5
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId
FROM (
SELECT YEAR(Data) AS Year, MONTH(Data) AS month
FROM `table`
GROUP BY YEAR(Data), MONTH(Data)
) m
CROSS JOIN `ID` i
WHERE NOT EXISTS (
SELECT *
FROM `table` t
WHERE YEAR(t.Data) = m.year
AND MONTH(t.Data) = m.month
AND t.newId = i.Id
)
ORDER BY m.Year DESC, m.month DESC
如何使用持久值进行优化?
如果我们对YEAR()
和MONTH()
进行预运算,直接将结果存储在table中,那么查询速度会提高,但我们也可以加索引超充电。
Consider the over all PROs and CONs before going this far...
- Do you really need this level of optimisation?
- How often is the query going to be executed?
- Can you change the application logic to use a more appropriate
WHERE
clause to restrict the scope of the data instead?
物化视图
对此的一种解决方案是创建和管理实体化视图。这是一种 DW 技术,它可以有效地让您定义一个视图,但定期执行并存储到它自己的 table space.
A materialised View does not optimise your query, but it allows you to execute complex and long-running query once, so that the results can be queried directly like a normal table, without having to re-evaluate column expressions.
您的数据和查询类型看起来很适合物化视图,因为它查询的历史数据变化率为零或非常低,仅更新新行,我们可能不关心当月结果。在这种情况下,如果您最终 运行 查询多次,并且结果或多或少保持不变,那么为什么不 运行 查询作为一个过程,比如说每个月并将结果存储在一个专门构建的 table,那么您的应用程序可以经常查询 table 并获得闪电般快速的结果。
MySQL does not support Materialized Views, but you can replicate the concept as explained above in your application logic, some other RDBMS provide this OOTB, its the concept that should be considered.
计算列
您可以将额外的列添加到您的 table 并根据 user/application 逻辑维护这些列,但这不是很可靠,除非您信任您的应用程序开发人员并且该应用程序是唯一将更新此 table.
的进程在这种情况下,计算列非常适合可靠性,但只有当您可以将值保存到列存储时,它们才能帮助我们提高性能。 (计算列的默认状态是表达式将在执行时计算,这对当前查询没有什么好处)
Again this is where MySQL will let you down, many other RDBMS offer simpler ways to do this, you need MySQL v5.7 for this to work
ALTER TABLE `table` ADD `year` GENERATED ALWAYS AS (YEAR(Data)) STORED; ALTER TABLE `table` ADD `month` GENERATED ALWAYS AS (MONTH(Data)) STORED;
触发器
您的另一个选择是添加列,然后使用触发器来维护值,MySQL 并不容易,但它可以工作
将列添加到您的 table:
ALTER TABLE `table` ADD (`year` int NULL); ALTER TABLE `table` ADD (`month` int NULL);
创建触发器来管理这些列中的值,以便用户无法覆盖它们:
DELIMITER $$ CREATE TRIGGER persist_index_values_insert BEFORE INSERT ON `table` FOR EACH ROW BEGIN SET new.year= YEAR(new.Data); SET new.month = MONTH(NEW.Data); END$$ CREATE TRIGGER persist_index_values_update BEFORE UPDATE ON `table` FOR EACH ROW BEGIN SET NEW.year = YEAR(NEW.Data); SET NEW.month = MONTH(NEW.Data); END$$
分隔符;
更简单的查询:
SELECT DATE_FORMAT(CAST(CONCAT(m.year,'-',m.month,'-01') as DateTime), '%b-%y') as Data, i.Id as newId FROM ( SELECT `year`, `month` FROM `table` GROUP BY `year`, `month` ) m CROSS JOIN `ID` i LEFT OUTER JOIN `table` t ON t.year = m.year AND t.month = m.month AND t.newId = i.Id WHERE t.Data IS NULL ORDER BY m.Year DESC, m.month DESC
现在可以根据需要应用索引,您应该查阅查询执行计划以获取指导,但我建议您需要一个索引用于
year
和month
和newId
至少:CREATE INDEX IX_TABLE_YEAR_MONTH_NEWID ON `table` (`year`,`month`,'newId');