当数据量达到百万级时,如何让这个查询变得高效和快速
How to make this query efficient and faster when the data exist up to millions data
有没有更快速高效显示排名数据的方案
我用的查询感觉很慢,因为数据已经很多了
我有table和数据:
Table 1:
CREATE TABLE `table1` (
`location_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
`location_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`location_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table1` (`location_id`,`location_name`) VALUES
('001','Asia'),('002','Africa');
+-------------+---------------+
| location_id | location_name |
+-------------+---------------+
| 001 | Asia |
| 002 | Africa |
+-------------+---------------+
Table 2:
CREATE TABLE `table2` (
`device_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
`device_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`location_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`device_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table2` (`device_id`,`device_name`, `location_id`) VALUES
('D001','Water Level','001'),('D002','Air Direction','001'),('D003','Rain Fall','001');
+-----------+---------------+-------------+
| device_id | device_name | location_id |
+-----------+---------------+-------------+
| D001 | Water Level | 001 |
| D002 | Air Direction | 001 |
| D003 | Rain Fall | 001 |
+-----------+---------------+-------------+
Table 3:
CREATE TABLE `table3` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`device_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`send_date` datetime DEFAULT NULL,
`value` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table3` (`device_id`,`send_date`, `value`) VALUES
('D001','2021-07-11 07:00',30.10),
('D001','2021-07-11 07:30',29.21),
('D001','2021-07-11 08:00',31.07),
('D002','2021-07-11 07:00',28.18),
('D002','2021-07-11 07:30',30.10),
('D002','2021-07-11 08:00',31.07),
('D003','2021-07-11 07:00',29.21),
('D003','2021-07-11 07:30',31.07),
('D003','2021-07-11 08:00',30.10),
('D001','2021-07-12 07:00',30.10),
('D001','2021-07-12 07:30',29.21),
('D001','2021-07-12 08:00',31.07),
('D002','2021-07-12 07:00',28.18),
('D002','2021-07-12 07:30',30.10),
('D002','2021-07-12 08:00',31.07),
('D003','2021-07-12 07:00',29.21),
('D003','2021-07-12 07:30',31.07),
('D003','2021-07-12 08:00',30.10);
+----------------+---------------------+-------+
| id | device_id | send_date | value |
+--------------- +---------------------+-------+
| 1 | D001 | 2021-07-11 07:00:00 | 30.1 |
| 2 | D001 | 2021-07-11 07:30:00 | 29.21 |
| 3 | D001 | 2021-07-11 08:00:00 | 31.07 |
| 4 | D002 | 2021-07-11 07:00:00 | 28.18 |
| 5 | D002 | 2021-07-11 07:30:00 | 30.1 |
| 6 | D002 | 2021-07-11 08:00:00 | 31.07 |
| 7 | D003 | 2021-07-11 07:00:00 | 29.21 |
| 8 | D003 | 2021-07-11 07:30:00 | 31.07 |
| 9 | D003 | 2021-07-11 08:00:00 | 30.1 |
| 10 | D001 | 2021-07-12 07:00:00 | 30.1 |
| 11 | D001 | 2021-07-12 07:30:00 | 29.21 |
| 12 | D001 | 2021-07-12 08:00:00 | 31.07 |
| 13 | D002 | 2021-07-12 07:00:00 | 28.18 |
| 14 | D002 | 2021-07-12 07:30:00 | 30.1 |
| 15 | D002 | 2021-07-12 08:00:00 | 31.07 |
| 16 | D003 | 2021-07-12 07:00:00 | 29.21 |
| 17 | D003 | 2021-07-12 07:30:00 | 31.07 |
| 18 | D003 | 2021-07-12 08:00:00 | 30.1 |
+----------------+---------------+-------------+
我用这个查询每组显示1条数据,但是当有百万条数据时,查询会感觉很慢
SELECT * FROM
(SELECT
a.`location_name`,c.`device_id`,b.`device_name`,c.`send_date`,c.`value`,
ROW_NUMBER() OVER (PARTITION BY c.`device_id` ORDER BY c.send_date DESC) AS ranking
FROM
table1 a
INNER JOIN table2 b ON b.`location_id`=a.`location_id`
INNER JOIN table3 c ON c.`device_id`=b.`device_id`
) AS ta
WHERE ta.ranking = 1
ORDER BY ta.device_id ASC
输出:
+---------------+---------------+---------------------+-------+---------+
| location_name | device_name | send_date | value | ranking |
+---------------+---------------+---------------------+-------+---------+
| Asia | Water Level | 2021-07-12 09:00:00 | 28.18 | 1 |
| Asia | Air Direction | 2021-07-12 09:00:00 | 28 | 1 |
| Asia | Rain Fall | 2021-07-12 09:00:00 | 28 | 1 |
+---------------+---------------+---------------------+-------+---------+
请帮帮我。
因为如果我在数据达到数百万时使用查询,过程会很慢
感谢您的回答
您可能会更快地找到具有正确索引的相关子查询:
SELECT a.`location_name`, c.`device_id`, b.`device_name`, c.`send_date`, c.`value`
FROM table1 a JOIN
table2 b
ON b.`location_id` = a.`location_id` JOIN
table3 c
ON c.`device_id` = b.`device_id`
WHERE c.send_date = (SELECT MAX(c2.send_date)
FROM table3 c2
WHERE c2.device_id = c.device_id
);
您要的索引在table3(device_id, send_date)
:
CREATE INDEX idx_table3_device_id_send_date ON table3(device_id, send_date);
看我添加的标签。
JOINs
在获得排名靠前的行之后。也就是说,单独处理 c
的“排名”,然后是 JOIN
到 b
和 a
.
添加这个索引:
c: INDEX(device_id, send_date, value)
有没有更快速高效显示排名数据的方案 我用的查询感觉很慢,因为数据已经很多了
我有table和数据:
Table 1:
CREATE TABLE `table1` (
`location_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
`location_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`location_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table1` (`location_id`,`location_name`) VALUES
('001','Asia'),('002','Africa');
+-------------+---------------+
| location_id | location_name |
+-------------+---------------+
| 001 | Asia |
| 002 | Africa |
+-------------+---------------+
Table 2:
CREATE TABLE `table2` (
`device_id` varchar(5) COLLATE utf8mb4_unicode_ci NOT NULL,
`device_name` varchar(50) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`location_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
PRIMARY KEY (`device_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table2` (`device_id`,`device_name`, `location_id`) VALUES
('D001','Water Level','001'),('D002','Air Direction','001'),('D003','Rain Fall','001');
+-----------+---------------+-------------+
| device_id | device_name | location_id |
+-----------+---------------+-------------+
| D001 | Water Level | 001 |
| D002 | Air Direction | 001 |
| D003 | Rain Fall | 001 |
+-----------+---------------+-------------+
Table 3:
CREATE TABLE `table3` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`device_id` varchar(5) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`send_date` datetime DEFAULT NULL,
`value` double DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci
INSERT INTO `table3` (`device_id`,`send_date`, `value`) VALUES
('D001','2021-07-11 07:00',30.10),
('D001','2021-07-11 07:30',29.21),
('D001','2021-07-11 08:00',31.07),
('D002','2021-07-11 07:00',28.18),
('D002','2021-07-11 07:30',30.10),
('D002','2021-07-11 08:00',31.07),
('D003','2021-07-11 07:00',29.21),
('D003','2021-07-11 07:30',31.07),
('D003','2021-07-11 08:00',30.10),
('D001','2021-07-12 07:00',30.10),
('D001','2021-07-12 07:30',29.21),
('D001','2021-07-12 08:00',31.07),
('D002','2021-07-12 07:00',28.18),
('D002','2021-07-12 07:30',30.10),
('D002','2021-07-12 08:00',31.07),
('D003','2021-07-12 07:00',29.21),
('D003','2021-07-12 07:30',31.07),
('D003','2021-07-12 08:00',30.10);
+----------------+---------------------+-------+
| id | device_id | send_date | value |
+--------------- +---------------------+-------+
| 1 | D001 | 2021-07-11 07:00:00 | 30.1 |
| 2 | D001 | 2021-07-11 07:30:00 | 29.21 |
| 3 | D001 | 2021-07-11 08:00:00 | 31.07 |
| 4 | D002 | 2021-07-11 07:00:00 | 28.18 |
| 5 | D002 | 2021-07-11 07:30:00 | 30.1 |
| 6 | D002 | 2021-07-11 08:00:00 | 31.07 |
| 7 | D003 | 2021-07-11 07:00:00 | 29.21 |
| 8 | D003 | 2021-07-11 07:30:00 | 31.07 |
| 9 | D003 | 2021-07-11 08:00:00 | 30.1 |
| 10 | D001 | 2021-07-12 07:00:00 | 30.1 |
| 11 | D001 | 2021-07-12 07:30:00 | 29.21 |
| 12 | D001 | 2021-07-12 08:00:00 | 31.07 |
| 13 | D002 | 2021-07-12 07:00:00 | 28.18 |
| 14 | D002 | 2021-07-12 07:30:00 | 30.1 |
| 15 | D002 | 2021-07-12 08:00:00 | 31.07 |
| 16 | D003 | 2021-07-12 07:00:00 | 29.21 |
| 17 | D003 | 2021-07-12 07:30:00 | 31.07 |
| 18 | D003 | 2021-07-12 08:00:00 | 30.1 |
+----------------+---------------+-------------+
我用这个查询每组显示1条数据,但是当有百万条数据时,查询会感觉很慢
SELECT * FROM
(SELECT
a.`location_name`,c.`device_id`,b.`device_name`,c.`send_date`,c.`value`,
ROW_NUMBER() OVER (PARTITION BY c.`device_id` ORDER BY c.send_date DESC) AS ranking
FROM
table1 a
INNER JOIN table2 b ON b.`location_id`=a.`location_id`
INNER JOIN table3 c ON c.`device_id`=b.`device_id`
) AS ta
WHERE ta.ranking = 1
ORDER BY ta.device_id ASC
输出:
+---------------+---------------+---------------------+-------+---------+
| location_name | device_name | send_date | value | ranking |
+---------------+---------------+---------------------+-------+---------+
| Asia | Water Level | 2021-07-12 09:00:00 | 28.18 | 1 |
| Asia | Air Direction | 2021-07-12 09:00:00 | 28 | 1 |
| Asia | Rain Fall | 2021-07-12 09:00:00 | 28 | 1 |
+---------------+---------------+---------------------+-------+---------+
请帮帮我。 因为如果我在数据达到数百万时使用查询,过程会很慢 感谢您的回答
您可能会更快地找到具有正确索引的相关子查询:
SELECT a.`location_name`, c.`device_id`, b.`device_name`, c.`send_date`, c.`value`
FROM table1 a JOIN
table2 b
ON b.`location_id` = a.`location_id` JOIN
table3 c
ON c.`device_id` = b.`device_id`
WHERE c.send_date = (SELECT MAX(c2.send_date)
FROM table3 c2
WHERE c2.device_id = c.device_id
);
您要的索引在table3(device_id, send_date)
:
CREATE INDEX idx_table3_device_id_send_date ON table3(device_id, send_date);
看我添加的标签。
JOINs
在获得排名靠前的行之后。也就是说,单独处理c
的“排名”,然后是JOIN
到b
和a
.添加这个索引:
c: INDEX(device_id, send_date, value)