如何select MariaDB中每个组的最新成员?

How to select latest member in each group in MariaDB?

我有 3 个表:

  1. 账户 - 账户信息
  2. 机器 - 机器信息
  3. account_machine - 在日期
  4. 将帐户映射到计算机

每个帐户由一台机器处理。随着时间的推移,一个帐户可以迁移到不同的机器,但在给定的一天,它只能由一台机器处理。如果一个账户不再有效,那么对应的machine_id就是0。给定一个日期,我想找到所有活跃的账户,所以我想出了这个查询:

SELECT account.id 
FROM account JOIN account_machine m 
ON m.account_id=account.id && m.machine_id && m.machine_id=
(SELECT machine_id 
FROM account_machine 
WHERE account_id=account.id && date<=20170215 
ORDER BY date DESC LIMIT 1) 
GROUP BY account.id;

这适用于 MySQL 但不适用于 MariaDB。

MariaDB [db]> select * from account_machine;
+------------+------------+------------+
| date       | account_id | machine_id |
+------------+------------+------------+
| 2013-01-01 |          1 |          1 |
| 2013-01-01 |          8 |          1 |
| 2013-01-01 |          2 |          2 |
| 2013-01-01 |          3 |          2 |
| 2013-01-01 |          4 |          3 |
| 2013-01-01 |         12 |          3 |
| 2016-04-01 |         24 |          3 |
| 2013-01-01 |          5 |          5 |
| 2013-01-01 |          6 |          8 |
| 2013-01-01 |          7 |          6 |
| 2014-01-01 |          9 |          6 |
| 2013-01-01 |         10 |          4 |
| 2014-07-01 |         11 |         10 |
| 2014-01-01 |         13 |          7 |
| 2014-01-01 |         14 |          7 |
| 2014-07-01 |         15 |         11 |
| 2014-07-01 |         16 |         14 |
| 2014-07-01 |         17 |         12 |
| 2015-01-01 |         18 |         13 |
| 2015-01-01 |         19 |         13 |
| 2015-04-01 |         20 |         13 |
| 2015-04-01 |         21 |          7 |
| 2015-04-01 |         22 |         13 |
| 2016-04-01 |         23 |         15 |
| 2016-05-01 |         25 |          9 |
| 2016-05-19 |         26 |          4 |
| 2014-08-06 |          1 |          0 |
| 2016-01-15 |         12 |          0 |
| 2015-11-04 |         19 |         12 |
| 2016-05-23 |         10 |          0 |
| 2016-05-26 |          2 |         18 |
| 2016-05-27 |         13 |         16 |
| 2016-06-02 |         27 |          3 |
| 2016-06-02 |          4 |          0 |
| 2016-06-08 |         28 |         17 |
| 2016-06-21 |         29 |         19 |
| 2016-07-11 |         30 |         20 |
| 2016-08-15 |         13 |          0 |
| 2016-08-19 |          2 |         18 |
| 2016-08-25 |         31 |         21 |
| 2016-09-08 |         32 |         20 |
| 2016-11-30 |         19 |         12 |
| 2016-11-30 |         22 |         13 |
| 2017-01-20 |         33 |         15 |
+------------+------------+------------+

MariaDB [db]> select account.id from account join account_machine m on m.account_id=account.id && m.machine_id && m.machine_id=(select a.machine_id from account_machine a where a.account_id=account.id && a.date<=20170215 order by a.date desc limit 1) group by account.id;
+----+
| id |
+----+
| 23 |
| 33 |
+----+

mysql> select account.id from account join account_machine m on m.account_id=account.id && m.machine_id && m.machine_id=(select a.machine_id from account_machine a where a.account_id=account.id && a.date<=20170215 order by a.date desc limit 1) group by account.id;
+----+
| id |
+----+
|  2 |
|  3 |
|  5 |
|  6 |
|  7 |
|  8 |
|  9 |
| 11 |
| 14 |
| 15 |
| 16 |
| 17 |
| 18 |
| 19 |
| 20 |
| 21 |
| 22 |
| 23 |
| 24 |
| 25 |
| 26 |
| 27 |
| 28 |
| 29 |
| 30 |
| 31 |
| 32 |
| 33 |
+----+

P.S。这里有 3 个表格供您重现:

CREATE TABLE `account` (
  `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;
INSERT INTO `account` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22),(23),(24),(25),(26),(27),(28),(29),(30),(31),(32),(33);

CREATE TABLE `account_machine` (
  `date` date NOT NULL,
  `account_id` smallint(5) unsigned NOT NULL,
  `machine_id` smallint(5) unsigned NOT NULL,
  PRIMARY KEY (`date`,`account_id`)
) ENGINE=MyISAM;
INSERT INTO `account_machine` VALUES ('2013-01-01',1,1),('2013-01-01',8,1),('2013-01-01',2,2),('2013-01-01',3,2),('2013-01-01',4,3),('2013-01-01',12,3),('2016-04-01',24,3),('2013-01-01',5,5),('2013-01-01',6,8),('2013-01-01',7,6),('2014-01-01',9,6),('2013-01-01',10,4),('2014-07-01',11,10),('2014-01-01',13,7),('2014-01-01',14,7),('2014-07-01',15,11),('2014-07-01',16,14),('2014-07-01',17,12),('2015-01-01',18,13),('2015-01-01',19,13),('2015-04-01',20,13),('2015-04-01',21,7),('2015-04-01',22,13),('2016-04-01',23,15),('2016-05-01',25,9),('2016-05-19',26,4),('2014-08-06',1,0),('2016-01-15',12,0),('2015-11-04',19,12),('2016-05-23',10,0),('2016-05-26',2,18),('2016-05-27',13,16),('2016-06-02',27,3),('2016-06-02',4,0),('2016-06-08',28,17),('2016-06-21',29,19),('2016-07-11',30,20),('2016-08-15',13,0),('2016-08-19',2,18),('2016-08-25',31,21),('2016-09-08',32,20),('2016-11-30',19,12),('2016-11-30',22,13),('2017-01-20',33,15);

CREATE TABLE `machine` (
  `id` smallint(5) unsigned NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM;
INSERT INTO `machine` VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20),(21),(22);

我怀疑您的查询存在设计缺陷 -- 如果子查询返回 account_id 对应 machine_id = 0。之后就不会再看了

当使用 JOIN...ON 时,最好将 加入信息放在 ON 子句中,而不是过滤信息;进入 WHERE.

看起来这样会更简单更快:

SELECT  account_id
    FROM  account_machine AS m
    WHERE  machine_id != 0
      AND  date <= 20170215
      AND  EXISTS (
        SELECT  *
            FROM  account
            WHERE  id = m.account_id 
                  )
    ORDER BY  date DESC
    LIMIT  1;

也许 EXISTS() 测试是多余的,可以删除?

INDEX(date) 可能有助于提高性能。

(不,我还没有发现为什么这两个服务器可能工作方式不同。看看我的版本是否工作。)

像这样的事情怎么样?

SELECT am1.account_id AS id
FROM account_machine am1
JOIN (
    SELECT account_id, MAX(date) AS date
    FROM account_machine
    GROUP BY account_id
    ) am2
ON am1.account_id = am2.account_id
AND am1.date = am2.date
AND am1.machine_id != 0
ORDER BY am1.account_id;

+----+
| id |
+----+
|  2 |
|  3 |
|  5 |
|  6 |
|  7 |
|  8 |
|  9 |
| 11 |
| 14 |
| 15 |
| 16 |
| 17 |
| 18 |
| 19 |
| 20 |
| 21 |
| 22 |
| 23 |
| 24 |
| 25 |
| 26 |
| 27 |
| 28 |
| 29 |
| 30 |
| 31 |
| 32 |
| 33 |
+----+
28 rows in set (0.00 sec)

我很想看看 MySQL 和 MariaDB 的 EXPLAIN EXTENDED / SHOW WARNINGS 的输出。这将准确地向您展示查询优化器是如何重写查询的。例如:

root@localhost [stack]> EXPLAIN EXTENDED SELECT am1.account_id AS id
    -> FROM account_machine am1
    -> JOIN (
    ->     SELECT account_id, MAX(date) AS date
    ->     FROM account_machine
    ->     GROUP BY account_id
    -> ) am2
    -> ON am1.account_id = am2.account_id
    -> AND am1.date = am2.date
    -> AND am1.machine_id != 0
    -> ORDER BY am1.account_id\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 44
     filtered: 100.00
        Extra: Using where; Using temporary; Using filesort
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: am1
         type: eq_ref
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 5
          ref: am2.date,am2.account_id
         rows: 1
     filtered: 100.00
        Extra: Using where
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: account_machine
         type: index
possible_keys: NULL
          key: PRIMARY
      key_len: 5
          ref: NULL
         rows: 44
     filtered: 100.00
        Extra: Using index; Using temporary; Using filesort
3 rows in set, 1 warning (0.00 sec)

root@localhost [stack]> SHOW WARNINGS\G
*************************** 1. row ***************************
  Level: Note
   Code: 1003
Message: select `stack`.`am1`.`account_id` AS `id` from
`stack`.`account_machine` `am1` join (select 
`stack`.`account_machine`.`account_id` AS 
`account_id`,max(`stack`.`account_machine`.`date`) AS `date` from 
`stack`.`account_machine` group by 
`stack`.`account_machine`.`account_id`) `am2` where 
((`stack`.`am1`.`account_id` = `am2`.`account_id`) and 
(`stack`.`am1`.`date` = `am2`.`date`) and (`stack`.`am1`.`machine_id` 
<> 0)) order by `stack`.`am1`.`account_id`
1 row in set (0.00 sec)

显然,没有索引的查询性能不佳,但对于有限的数据集来说还不错。