为什么 GROUP BY HOUR(timestamp) return 特定的时间戳?
Why does GROUP BY HOUR(timestamp) return particular timestamp?
问题:为什么在下面的示例中使用 GROUP BY HOUR(timestamp) 函数查询 returns usercount timestamp @ 2015-02-18 23:16:25 而不是,例如,第一次出现 @ 2015-02-18 23:14:12 ?
决定这个选择的MySQL内部机制是什么?
这是按小时分组查询的结果:
mysql> SELECT *, COUNT(user_id) AS usercount FROM table_log WHERE user_id = 1 GROUP BY HOUR(timestamp) ORDER BY timestamp,usercount DESC;
+------+---------+-----------+---------------------+-----------+
| id | user_id | user_name | timestamp | usercount |
+------+---------+-----------+---------------------+-----------+
| 1013 | 1 | 1 | 2015-02-16 00:51:32 | 2 |
| 1016 | 1 | 1 | 2015-02-16 21:38:52 | 2 |
| 1018 | 1 | 1 | 2015-02-17 02:05:44 | 3 |
| 1022 | 1 | 1 | 2015-02-18 04:51:22 | 8 |
| 1001 | 1 | 1 | 2015-02-18 23:16:25 | 22 |
| 1005 | 1 | 1 | 2015-02-19 03:06:01 | 5 |
| 1009 | 1 | 1 | 2015-02-19 05:15:32 | 3 |
| 1011 | 1 | 1 | 2015-02-19 11:57:26 | 1 |
| 1012 | 1 | 1 | 2015-02-19 12:09:20 | 1 |
+------+---------+-----------+---------------------+-----------+
9 rows in set (0.01 sec)
这是常规查询的结果(没有分组):
mysql> SELECT * FROM table_log WHERE user_id = 1 ORDER BY timestamp;
+------+---------+-----------+---------------------+
| id | user_id | user_name | timestamp |
+------+---------+-----------+---------------------+
| 1013 | 1 | 1 | 2015-02-16 00:51:32 |
| 1014 | 1 | 1 | 2015-02-16 00:51:38 |
| 1015 | 1 | 1 | 2015-02-16 03:12:28 |
| 1016 | 1 | 1 | 2015-02-16 21:38:52 |
| 1017 | 1 | 1 | 2015-02-16 21:39:33 |
| 1018 | 1 | 1 | 2015-02-17 02:05:44 |
| 1019 | 1 | 1 | 2015-02-17 02:05:52 |
| 1020 | 1 | 1 | 2015-02-17 02:05:55 |
| 1021 | 1 | 1 | 2015-02-17 05:21:19 |
| 1022 | 1 | 1 | 2015-02-18 04:51:22 |
| 1023 | 1 | 1 | 2015-02-18 04:51:31 |
| 1024 | 1 | 1 | 2015-02-18 04:51:35 |
| 1025 | 1 | 1 | 2015-02-18 04:51:43 |
| 1026 | 1 | 1 | 2015-02-18 04:51:46 |
| 1027 | 1 | 1 | 2015-02-18 04:52:10 |
| 1028 | 1 | 1 | 2015-02-18 04:52:24 |
| 1029 | 1 | 1 | 2015-02-18 04:52:31 |
| 1030 | 1 | 1 | 2015-02-18 23:14:12 |
| 1031 | 1 | 1 | 2015-02-18 23:14:16 |
| 1032 | 1 | 1 | 2015-02-18 23:14:53 |
| 1033 | 1 | 1 | 2015-02-18 23:14:57 |
| 1034 | 1 | 1 | 2015-02-18 23:14:59 |
| 1035 | 1 | 1 | 2015-02-18 23:15:02 |
| 1036 | 1 | 1 | 2015-02-18 23:15:05 |
| 1037 | 1 | 1 | 2015-02-18 23:15:08 |
| 1038 | 1 | 1 | 2015-02-18 23:15:10 |
| 1039 | 1 | 1 | 2015-02-18 23:15:12 |
| 1040 | 1 | 1 | 2015-02-18 23:15:13 |
| 1041 | 1 | 1 | 2015-02-18 23:15:14 |
| 1042 | 1 | 1 | 2015-02-18 23:15:24 |
| 1043 | 1 | 1 | 2015-02-18 23:15:29 |
| 1044 | 1 | 1 | 2015-02-18 23:15:39 |
| 1045 | 1 | 1 | 2015-02-18 23:15:44 |
| 1046 | 1 | 1 | 2015-02-18 23:16:15 |
| 1047 | 1 | 1 | 2015-02-18 23:16:20 |
| 1001 | 1 | 1 | 2015-02-18 23:16:25 |
| 1002 | 1 | 1 | 2015-02-18 23:35:31 |
| 1003 | 1 | 1 | 2015-02-18 23:47:20 |
| 1004 | 1 | 1 | 2015-02-18 23:47:27 |
| 1005 | 1 | 1 | 2015-02-19 03:06:01 |
| 1006 | 1 | 1 | 2015-02-19 03:06:05 |
| 1007 | 1 | 1 | 2015-02-19 03:06:11 |
| 1008 | 1 | 1 | 2015-02-19 03:06:19 |
| 1009 | 1 | 1 | 2015-02-19 05:15:32 |
| 1010 | 1 | 1 | 2015-02-19 05:15:35 |
| 1011 | 1 | 1 | 2015-02-19 11:57:26 |
| 1012 | 1 | 1 | 2015-02-19 12:09:20 |
+------+---------+-----------+---------------------+
47 rows in set (0.01 sec)
注意:id 列是 AUTO INC + 索引
不确定哪行将被return编辑。 MySQL 可以从组中的 任何 行中自由获得 return 值。
其他数据库会抛出这样的查询异常。但是 MySQL 扩展了功能,并允许 SELECT 列表中的非聚合。
此处记录了该行为:http://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
MySQL extends the use of GROUP BY
so that the select list can refer to nonaggregated columns not named in the GROUP BY
clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding an ORDER BY
clause. Sorting of the result set occurs after values have been chosen, and ORDER BY
does not affect which values within each group the server chooses.
为了 Q/A 的完整性,我将在这里重复我在评论部分已经注意到的内容:id 列 (AUTO INC) 是唯一的(索引)并且似乎也被考虑在内选择时间戳代表值。显然,最低的第一列 id 值分布在不同的日期,但相同的时间,总是 'wins' 作为组中选定的时间戳值。
问题:为什么在下面的示例中使用 GROUP BY HOUR(timestamp) 函数查询 returns usercount timestamp @ 2015-02-18 23:16:25 而不是,例如,第一次出现 @ 2015-02-18 23:14:12 ?
决定这个选择的MySQL内部机制是什么?
这是按小时分组查询的结果:
mysql> SELECT *, COUNT(user_id) AS usercount FROM table_log WHERE user_id = 1 GROUP BY HOUR(timestamp) ORDER BY timestamp,usercount DESC;
+------+---------+-----------+---------------------+-----------+
| id | user_id | user_name | timestamp | usercount |
+------+---------+-----------+---------------------+-----------+
| 1013 | 1 | 1 | 2015-02-16 00:51:32 | 2 |
| 1016 | 1 | 1 | 2015-02-16 21:38:52 | 2 |
| 1018 | 1 | 1 | 2015-02-17 02:05:44 | 3 |
| 1022 | 1 | 1 | 2015-02-18 04:51:22 | 8 |
| 1001 | 1 | 1 | 2015-02-18 23:16:25 | 22 |
| 1005 | 1 | 1 | 2015-02-19 03:06:01 | 5 |
| 1009 | 1 | 1 | 2015-02-19 05:15:32 | 3 |
| 1011 | 1 | 1 | 2015-02-19 11:57:26 | 1 |
| 1012 | 1 | 1 | 2015-02-19 12:09:20 | 1 |
+------+---------+-----------+---------------------+-----------+
9 rows in set (0.01 sec)
这是常规查询的结果(没有分组):
mysql> SELECT * FROM table_log WHERE user_id = 1 ORDER BY timestamp;
+------+---------+-----------+---------------------+
| id | user_id | user_name | timestamp |
+------+---------+-----------+---------------------+
| 1013 | 1 | 1 | 2015-02-16 00:51:32 |
| 1014 | 1 | 1 | 2015-02-16 00:51:38 |
| 1015 | 1 | 1 | 2015-02-16 03:12:28 |
| 1016 | 1 | 1 | 2015-02-16 21:38:52 |
| 1017 | 1 | 1 | 2015-02-16 21:39:33 |
| 1018 | 1 | 1 | 2015-02-17 02:05:44 |
| 1019 | 1 | 1 | 2015-02-17 02:05:52 |
| 1020 | 1 | 1 | 2015-02-17 02:05:55 |
| 1021 | 1 | 1 | 2015-02-17 05:21:19 |
| 1022 | 1 | 1 | 2015-02-18 04:51:22 |
| 1023 | 1 | 1 | 2015-02-18 04:51:31 |
| 1024 | 1 | 1 | 2015-02-18 04:51:35 |
| 1025 | 1 | 1 | 2015-02-18 04:51:43 |
| 1026 | 1 | 1 | 2015-02-18 04:51:46 |
| 1027 | 1 | 1 | 2015-02-18 04:52:10 |
| 1028 | 1 | 1 | 2015-02-18 04:52:24 |
| 1029 | 1 | 1 | 2015-02-18 04:52:31 |
| 1030 | 1 | 1 | 2015-02-18 23:14:12 |
| 1031 | 1 | 1 | 2015-02-18 23:14:16 |
| 1032 | 1 | 1 | 2015-02-18 23:14:53 |
| 1033 | 1 | 1 | 2015-02-18 23:14:57 |
| 1034 | 1 | 1 | 2015-02-18 23:14:59 |
| 1035 | 1 | 1 | 2015-02-18 23:15:02 |
| 1036 | 1 | 1 | 2015-02-18 23:15:05 |
| 1037 | 1 | 1 | 2015-02-18 23:15:08 |
| 1038 | 1 | 1 | 2015-02-18 23:15:10 |
| 1039 | 1 | 1 | 2015-02-18 23:15:12 |
| 1040 | 1 | 1 | 2015-02-18 23:15:13 |
| 1041 | 1 | 1 | 2015-02-18 23:15:14 |
| 1042 | 1 | 1 | 2015-02-18 23:15:24 |
| 1043 | 1 | 1 | 2015-02-18 23:15:29 |
| 1044 | 1 | 1 | 2015-02-18 23:15:39 |
| 1045 | 1 | 1 | 2015-02-18 23:15:44 |
| 1046 | 1 | 1 | 2015-02-18 23:16:15 |
| 1047 | 1 | 1 | 2015-02-18 23:16:20 |
| 1001 | 1 | 1 | 2015-02-18 23:16:25 |
| 1002 | 1 | 1 | 2015-02-18 23:35:31 |
| 1003 | 1 | 1 | 2015-02-18 23:47:20 |
| 1004 | 1 | 1 | 2015-02-18 23:47:27 |
| 1005 | 1 | 1 | 2015-02-19 03:06:01 |
| 1006 | 1 | 1 | 2015-02-19 03:06:05 |
| 1007 | 1 | 1 | 2015-02-19 03:06:11 |
| 1008 | 1 | 1 | 2015-02-19 03:06:19 |
| 1009 | 1 | 1 | 2015-02-19 05:15:32 |
| 1010 | 1 | 1 | 2015-02-19 05:15:35 |
| 1011 | 1 | 1 | 2015-02-19 11:57:26 |
| 1012 | 1 | 1 | 2015-02-19 12:09:20 |
+------+---------+-----------+---------------------+
47 rows in set (0.01 sec)
注意:id 列是 AUTO INC + 索引
不确定哪行将被return编辑。 MySQL 可以从组中的 任何 行中自由获得 return 值。
其他数据库会抛出这样的查询异常。但是 MySQL 扩展了功能,并允许 SELECT 列表中的非聚合。
此处记录了该行为:http://dev.mysql.com/doc/refman/5.5/en/group-by-handling.html
MySQL extends the use of
GROUP BY
so that the select list can refer to nonaggregated columns not named in theGROUP BY
clause. This means that the preceding query is legal in MySQL. You can use this feature to get better performance by avoiding unnecessary column sorting and grouping. However, this is useful primarily when all values in each nonaggregated column not named in the GROUP BY are the same for each group. The server is free to choose any value from each group, so unless they are the same, the values chosen are indeterminate. Furthermore, the selection of values from each group cannot be influenced by adding anORDER BY
clause. Sorting of the result set occurs after values have been chosen, andORDER BY
does not affect which values within each group the server chooses.
为了 Q/A 的完整性,我将在这里重复我在评论部分已经注意到的内容:id 列 (AUTO INC) 是唯一的(索引)并且似乎也被考虑在内选择时间戳代表值。显然,最低的第一列 id 值分布在不同的日期,但相同的时间,总是 'wins' 作为组中选定的时间戳值。