优化数据库中大型 table 的查询 (SQL)

Question

我正在尝试针对大型事件 table（超过 1000 万行）优化 sql 查询以进行日期范围搜索。我已经有了这个 table 的唯一索引，其中（盖子、做了、测量、日期）。下面的查询试图在每 2 秒的时间间隔内获取三种测量类型（千瓦、电流和电压）的事件日期列：

SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events 
WHERE lid = 1 
  and did = 1
  and measurement IN ("Voltage") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Current") 
group by timekey
UNION
SELECT *, FLOOR(UNIX_TIMESTAMP(date)/2) AS timekey 
from events
WHERE lid = 1
  and did = 1
  and measurement IN ("Kilowatts") 
group by timekey

这是我要查找的table。

=============================================================
id  |  lid   |   did   |   measurement  |  date 
=============================================================
1   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:00
=============================================================
2   |  1     |   1     |   Current      | 2020-04-27 00:00:00
=============================================================
3   |  1     |   1     |   Voltage      | 2020-04-27 00:00:00
=============================================================
4   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:01
=============================================================
5   |  1     |   1     |   Current      | 2020-04-27 00:00:01
=============================================================
6   |  1     |   1     |   Voltage      | 2020-04-27 00:00:01
=============================================================
7   |  1     |   1     |   Kilowatts    | 2020-04-27 00:00:02
=============================================================
8   |  1     |   1     |   Current      | 2020-04-27 00:00:02
=============================================================
9   |  1     |   1     |   Voltage      | 2020-04-27 00:00:02

预期结果是检索日期等于 2020-04-27 00:00:00 和 2020-04-27 00:00:02 的所有数据。上面提供的查询按预期工作。但我正在使用 UNION 在 table 上查找不同的测量值，我相信这可能不是最佳方法。

任何 SQL 专家都可以帮助我调整我必须提高性能的查询吗？

Answer 1

对于每一次测量，您每秒都有一个记录，并且您希望每两秒select一个记录。

你可以试试：

select *
from events
where 
    lid = 1 
    and did = 1 
    and measurement IN ('Voltage', 'Current')
    and extract(second from date) % 2 = 0

这将 select 个具有偶数第二部分的记录。

或者，如果你总是每秒有一条记录，另一种选择是row_number()（这需要MySQL 8.0）：

select *
from (
    select 
        e.*, 
        row_number() over(partition by measurement order by date) rn
    from events
    where 
        lid = 1 
        and did = 1 
        and measurement IN ('Voltage', 'Current')
) t
where rn % 2 = 1

不过，这比之前的查询要准确一些。

Answer 2

您的查询实际上是三个查询合二为一。幸运的是，它们都是基于相似列的 select 行数据。如果你想让这个查询运行快，你可以添加以下索引：

create index ix1 on events (lid, did, measurement);

Answer 3

除了上述建议外，更改 PRIMARY KEY 会给您带来更多性能：

PRIMARY KEY(lid, did, date, measurement)

然后扔 id.

警告，如果两个读数完全相同 "second"，可能会出现问题。如果一个读数刚好在时钟滴答声之后出现，而下一个读数正好在下一个滴答声之前出现，则很容易发生这种情况。

优化数据库中大型 table 的查询 (SQL)

Optimize the query for a large table in database (SQL)

mysql

sql

database

query-optimization

query-performance