当数据量达到百万级时,如何让这个查询变得高效和快速

How to make this query efficient and faster when the data exist up to millions data

有没有更快速高效显示排名数据的方案 我用的查询感觉很慢,因为数据已经很多了

我有table和数据:

Table 1:

CREATE TABLE `table1` (
  `location_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
  `location_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`location_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

INSERT INTO `table1` (`location_id`,`location_name`) VALUES 
('001','Asia'),('002','Africa');

+-------------+---------------+
| location_id | location_name | 
+-------------+---------------+
| 001         | Asia          |
| 002         | Africa        |
+-------------+---------------+

Table 2:

CREATE TABLE `table2` (
  `device_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
  `device_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `location_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`device_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

INSERT INTO `table2` (`device_id`,`device_name`, `location_id`) VALUES 
('D001','Water Level','001'),('D002','Air Direction','001'),('D003','Rain Fall','001');

+-----------+---------------+-------------+
| device_id | device_name   | location_id |
+-----------+---------------+-------------+
| D001      | Water Level   | 001         | 
| D002      | Air Direction | 001         | 
| D003      | Rain Fall     | 001         |
+-----------+---------------+-------------+

Table 3:

CREATE TABLE `table3` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `device_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `send_date` datetime DEFAULT NULL,
  `value` double DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci

INSERT INTO `table3` (`device_id`,`send_date`, `value`) VALUES 
('D001','2021-07-11 07:00',30.10),
('D001','2021-07-11 07:30',29.21),
('D001','2021-07-11 08:00',31.07),
('D002','2021-07-11 07:00',28.18),
('D002','2021-07-11 07:30',30.10),
('D002','2021-07-11 08:00',31.07),
('D003','2021-07-11 07:00',29.21),
('D003','2021-07-11 07:30',31.07),
('D003','2021-07-11 08:00',30.10),
('D001','2021-07-12 07:00',30.10),
('D001','2021-07-12 07:30',29.21),
('D001','2021-07-12 08:00',31.07),
('D002','2021-07-12 07:00',28.18),
('D002','2021-07-12 07:30',30.10),
('D002','2021-07-12 08:00',31.07),
('D003','2021-07-12 07:00',29.21),
('D003','2021-07-12 07:30',31.07),
('D003','2021-07-12 08:00',30.10);

+----------------+---------------------+-------+
| id | device_id | send_date           | value |
+--------------- +---------------------+-------+
| 1  | D001      | 2021-07-11 07:00:00 | 30.1  |
| 2  | D001      | 2021-07-11 07:30:00 | 29.21 |
| 3  | D001      | 2021-07-11 08:00:00 | 31.07 |
| 4  | D002      | 2021-07-11 07:00:00 | 28.18 |
| 5  | D002      | 2021-07-11 07:30:00 | 30.1  |
| 6  | D002      | 2021-07-11 08:00:00 | 31.07 |
| 7  | D003      | 2021-07-11 07:00:00 | 29.21 |
| 8  | D003      | 2021-07-11 07:30:00 | 31.07 |
| 9  | D003      | 2021-07-11 08:00:00 | 30.1  |
| 10 | D001      | 2021-07-12 07:00:00 | 30.1  |
| 11 | D001      | 2021-07-12 07:30:00 | 29.21 |
| 12 | D001      | 2021-07-12 08:00:00 | 31.07 |
| 13 | D002      | 2021-07-12 07:00:00 | 28.18 |
| 14 | D002      | 2021-07-12 07:30:00 | 30.1  |
| 15 | D002      | 2021-07-12 08:00:00 | 31.07 |
| 16 | D003      | 2021-07-12 07:00:00 | 29.21 |
| 17 | D003      | 2021-07-12 07:30:00 | 31.07 |
| 18 | D003      | 2021-07-12 08:00:00 | 30.1  |
+----------------+---------------+-------------+

我用这个查询每组显示1条数据,但是当有百万条数据时,查询会感觉很慢

SELECT * FROM
    (SELECT 
    a.`location_name`,c.`device_id`,b.`device_name`,c.`send_date`,c.`value`,
    ROW_NUMBER() OVER (PARTITION BY c.`device_id` ORDER BY c.send_date DESC) AS ranking
    FROM
    table1 a
    INNER JOIN table2 b ON b.`location_id`=a.`location_id`
    INNER JOIN table3 c ON c.`device_id`=b.`device_id`  
    ) AS ta
WHERE ta.ranking = 1
ORDER BY ta.device_id ASC

输出:

+---------------+---------------+---------------------+-------+---------+
| location_name | device_name   | send_date           | value | ranking |
+---------------+---------------+---------------------+-------+---------+
| Asia          | Water Level   | 2021-07-12 09:00:00 | 28.18 |     1   |
| Asia          | Air Direction | 2021-07-12 09:00:00 | 28    |     1   |
| Asia          | Rain Fall     | 2021-07-12 09:00:00 | 28    |     1   |
+---------------+---------------+---------------------+-------+---------+

请帮帮我。 因为如果我在数据达到数百万时使用查询,过程会很慢 感谢您的回答

您可能会更快地找到具有正确索引的相关子查询:

SELECT a.`location_name`, c.`device_id`, b.`device_name`, c.`send_date`, c.`value`
FROM table1 a JOIN
     table2 b
     ON b.`location_id` = a.`location_id` JOIN
     table3 c
     ON c.`device_id` = b.`device_id`
WHERE c.send_date = (SELECT MAX(c2.send_date)
                     FROM table3 c2
                     WHERE c2.device_id = c.device_id
                    ); 

您要的索引在table3(device_id, send_date):

CREATE INDEX idx_table3_device_id_send_date ON table3(device_id, send_date);
  • 看我添加的标签。

  • JOINs获得排名靠前的行之后。也就是说,单独处理 c 的“排名”,然后是 JOINba.

  • 添加这个索引:

      c:  INDEX(device_id, send_date, value)