优化数据库中大型 table 的查询 (SQL)
Optimize the query for a large table in database (SQL)
我正在尝试针对大型事件 table(超过 1000 万行)优化 sql 查询以进行日期范围搜索。我已经有了这个 table 的唯一索引,其中(盖子、做了、测量、日期)。下面的查询试图在每 2 秒的时间间隔内获取三种测量类型(千瓦、电流和电压)的事件日期列:
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Voltage")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts")
group by timekey
这是我要查找的table。
=============================================================
id | lid | did | measurement | date
=============================================================
1 | 1 | 1 | Kilowatts | 2020-04-27 00:00:00
=============================================================
2 | 1 | 1 | Current | 2020-04-27 00:00:00
=============================================================
3 | 1 | 1 | Voltage | 2020-04-27 00:00:00
=============================================================
4 | 1 | 1 | Kilowatts | 2020-04-27 00:00:01
=============================================================
5 | 1 | 1 | Current | 2020-04-27 00:00:01
=============================================================
6 | 1 | 1 | Voltage | 2020-04-27 00:00:01
=============================================================
7 | 1 | 1 | Kilowatts | 2020-04-27 00:00:02
=============================================================
8 | 1 | 1 | Current | 2020-04-27 00:00:02
=============================================================
9 | 1 | 1 | Voltage | 2020-04-27 00:00:02
预期结果是检索日期等于 2020-04-27 00:00:00 和 2020-04-27 00:00:02 的所有数据。上面提供的查询按预期工作。但我正在使用 UNION 在 table 上查找不同的测量值,我相信这可能不是最佳方法。
任何 SQL 专家都可以帮助我调整我必须提高性能的查询吗?
对于每一次测量,您每秒都有一个记录,并且您希望每两秒select一个记录。
你可以试试:
select *
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0
这将 select 个具有偶数第二部分的记录。
或者,如果你总是每秒有一条记录,另一种选择是row_number()
(这需要MySQL 8.0):
select *
from (
select
e.*,
row_number() over(partition by measurement order by date) rn
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1
不过,这比之前的查询要准确一些。
您的查询实际上是三个查询合二为一。幸运的是,它们都是基于相似列的 select 行数据。如果你想让这个查询 运行 快,你可以添加以下索引:
create index ix1 on events (lid, did, measurement);
除了上述建议外,更改 PRIMARY KEY
会给您带来更多性能:
PRIMARY KEY(lid, did, date, measurement)
然后扔 id
.
警告,如果两个读数完全相同 "second",可能会出现问题。如果一个读数刚好在时钟滴答声之后出现,而下一个读数正好在下一个滴答声之前出现,则很容易发生这种情况。
我正在尝试针对大型事件 table(超过 1000 万行)优化 sql 查询以进行日期范围搜索。我已经有了这个 table 的唯一索引,其中(盖子、做了、测量、日期)。下面的查询试图在每 2 秒的时间间隔内获取三种测量类型(千瓦、电流和电压)的事件日期列:
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Voltage")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Current")
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey
from events
WHERE lid = 1
and did = 1
and measurement IN ("Kilowatts")
group by timekey
这是我要查找的table。
=============================================================
id | lid | did | measurement | date
=============================================================
1 | 1 | 1 | Kilowatts | 2020-04-27 00:00:00
=============================================================
2 | 1 | 1 | Current | 2020-04-27 00:00:00
=============================================================
3 | 1 | 1 | Voltage | 2020-04-27 00:00:00
=============================================================
4 | 1 | 1 | Kilowatts | 2020-04-27 00:00:01
=============================================================
5 | 1 | 1 | Current | 2020-04-27 00:00:01
=============================================================
6 | 1 | 1 | Voltage | 2020-04-27 00:00:01
=============================================================
7 | 1 | 1 | Kilowatts | 2020-04-27 00:00:02
=============================================================
8 | 1 | 1 | Current | 2020-04-27 00:00:02
=============================================================
9 | 1 | 1 | Voltage | 2020-04-27 00:00:02
预期结果是检索日期等于 2020-04-27 00:00:00 和 2020-04-27 00:00:02 的所有数据。上面提供的查询按预期工作。但我正在使用 UNION 在 table 上查找不同的测量值,我相信这可能不是最佳方法。
任何 SQL 专家都可以帮助我调整我必须提高性能的查询吗?
对于每一次测量,您每秒都有一个记录,并且您希望每两秒select一个记录。
你可以试试:
select *
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
and extract(second from date) % 2 = 0
这将 select 个具有偶数第二部分的记录。
或者,如果你总是每秒有一条记录,另一种选择是row_number()
(这需要MySQL 8.0):
select *
from (
select
e.*,
row_number() over(partition by measurement order by date) rn
from events
where
lid = 1
and did = 1
and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1
不过,这比之前的查询要准确一些。
您的查询实际上是三个查询合二为一。幸运的是,它们都是基于相似列的 select 行数据。如果你想让这个查询 运行 快,你可以添加以下索引:
create index ix1 on events (lid, did, measurement);
除了上述建议外,更改 PRIMARY KEY
会给您带来更多性能:
PRIMARY KEY(lid, did, date, measurement)
然后扔 id
.
警告,如果两个读数完全相同 "second",可能会出现问题。如果一个读数刚好在时钟滴答声之后出现,而下一个读数正好在下一个滴答声之前出现,则很容易发生这种情况。