根据 mysql 中子集中另一列的最大值从子集中选择列

Selecting columns from a subset based on the max of another column in the subset in mysql

生物学家和 mySQL(版本 5.7.13)初学者,我目前正面临一项我无法解决的任务。我有一个 table 记录个人的目击事件以及时间,数据的摘录如下所示:

Table "tblSightings"
+---------------+---------+-----------+---------------------+
| id_individual | project | id_survey | Surveydatetime      |
+---------------+---------+-----------+---------------------+
| A             |       1 | S1        | 2016-11-18 15:54:00 |
| B             |       1 | S1        | 2016-11-18 15:54:00 |
| C             |       1 | S1        | 2016-11-18 15:54:00 |
| A             |       1 | S2        | 2016-11-06 13:33:00 |
| B             |       1 | S2        | 2016-11-06 13:33:00 |
| X             |       1 | S2        | 2016-11-06 13:33:00 |
| A             |       2 | S3        | 2015-05-01 12:48:00 |
+---------------+---------+-----------+---------------------+

我想做的是创建一个查询,列出最近一次目击的个人(id_individual + 项目的最高调查日期时间)以及相应的 id_survey 和所有其他个人在那次目击中与它一起被目击 (GROUP_CONCAT(id_individual))。基于此处示例数据的预期结果为:

+---------------+---------+---------------+------------+---------------------+
| id_individual | project | id_survey     | associates | latest              |
+---------------+---------+---------------+------------+---------------------+
| A             |       1 | S1            | B C        | 2016-11-18 15:54:00 |
| B             |       1 | S1            | A C        | 2016-11-18 15:54:00 |
| C             |       1 | S1            | A B        | 2016-11-18 15:54:00 |
| X             |       1 | S2            | A B        | 2016-11-06 13:33:00 |
| A             |       2 | S3            |            | 2015-05-01 12:48:00 |
+---------------+---------+---------------+------------+---------------------+

我确实弄清楚了如何使用

为每个人获取最新的 Surveydatetime
SELECT 
id_individual, 
project, 
MAX(Surveydatetime) AS latest 
FROM tblSightings 
GROUP BY id_individual, project; 

但我无法弄清楚如何为 "latest" 列获取相应的 "id_survey",因此也无法弄清楚如何从目击中获取所有 id_individual GROUP_CONCAT 用于所需结果中的关联列。当我在 SELECT 中包含 id_survey 时它不起作用,因为我还必须将它放在 GROUP BY 中,从而再次为每个人生成多行。到目前为止,我发现 "max of subsets" 的大多数答案都是使用 SELECT 语句进行 INNER JOIN,但我根本无法让它工作...

非常感谢任何帮助!谢谢!

试试这个:

Select
        t2.id_individual, t2.project, t2.survey id_survey,
        (
            Select GROUP_CONCAT(tt.id_individual)
            From tblsightings tt
            Where tt.project = t2.project and tt.id_survey = t2.survey and tt.id_individual <> t2.id_individual
        ) associates,
        t2.maxdate latest
From
(
      Select t1.project, t1.id_individual, maxdate,
            (
                Select id_survey
                From tblsightings tt
                Where tt.project = t1.project and tt.id_individual = t1.id_individual and tt.surveydatetime = t1.maxdate
            ) survey
      From 
      (
          Select project, id_individual, max(surveydatetime) maxdate
          From tblsightings t1
          Group by project, id_individual
      ) t1
) t2
Order by t2.project, t2.id_individual

我使用的数据:

CREATE TABLE tblsightings 
(
  id_individual varchar(100),
  surveydatetime varchar(100),
  id_survey varchar(100),
  project varchar(100)

  );

INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("A","2016-11-18 15:54:00","S1","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("B","2016-11-18 15:54:00","S1","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("C","2016-11-18 15:54:00","S1","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("A","2016-11-06 13:33:00","S2","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("B","2016-11-06 13:33:00","S2","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("X","2016-11-06 13:33:00","S2","1");
INSERT INTO tblsightings (id_individual,surveydatetime,id_survey,project) VALUES ("A","2015-05-01 12:48:00","S3","2");

这是编写此查询的一种方法:

SELECT t1.id_individual, t1.project, ts.id_survey, t1.latest,
GROUP_CONCAT(t2.id_individual) AS associates

FROM tblSightings ts
    INNER JOIN
    ( SELECT 
            id_individual, 
            project, MAX(Surveydatetime) AS latest 
        FROM tblSightings 
        GROUP BY id_individual, project
    ) t1
        ON t1.id_individual = ts.id_individual
        AND t1.project = ts.project
        AND t1.latest = ts.Surveydatetime

    LEFT JOIN tblSightings t2
        ON ts.id_survey = t2.id_survey
        AND ts.project = t2.project
        AND t1.latest = t2.Surveydatetime
        AND t1.id_individual != t2.id_individual

    GROUP BY t1.id_individual, t1.project, ts.id_survey, t1.latest
    ORDER BY t1.latest DESC, t1.project, t1.id_individual, ts.id_survey;


解释:

要获得给定格式的结果,我们需要 JOIN 相同的 table 三次。第一个是 INNER JOIN,用于获取每个项目每个人具有最高时间戳的记录的 id_survey。第二个是确定给定个人是否有任何同事。由于可能根本没有任何关联(如 S3 所示),我们在这里使用 LEFT JOIN 代替。我们还确保此 LEFT JOIN 仅对那些 id_individual 进行操作,这些人与正在处理其记录的个人不同,但他们属于同一项目和调查。


Demo link