"group by" 是否自动保证 "order by"?
Does "group by" automatically guarantee "order by"?
"group by" 子句是否自动保证结果将按该键排序?换句话说,这样写就够了吗:
select *
from table
group by a, b, c
还是必须写
select *
from table
group by a, b, c
order by a, b, c
我知道,例如在 MySQL 中,我不必这样做,但我想知道我是否可以在 SQL 实现中依赖它。有保障吗?
group by
不一定对数据进行排序。数据库旨在尽可能快地获取数据,并且仅在必要时才进行排序。
因此,如果您需要保证订单,请添加 order by
。
绝对不是。我有过这样的经历,一旦我的一个查询突然开始 return 未排序的结果,随着 table 中的数据增长。
这取决于记录的数量。当记录较少时,Group by自动排序。当记录较多(超过15条)时需要添加Order by clause
我试过了。 Msdn 的 Adventureworks 数据库。
select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate
结果:
2009-01-10生产技术员-WC40
2009-01-11应用专家
2009-01-12财务总监助理
2009-01-13生产技术员-WC50<
它returns 对 hiredate 的数据进行了排序,但在任何情况下都不依赖 GROUP BY 进行排序。
例如;索引可以更改此排序数据。
我添加了以下索引(hiredate, jobtitle)
CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
[JobTitle] ASC,
[HireDate] ASC
)
结果将随着相同的 select 查询而改变;
2006-06-30 生产技术员-WC60
2007-01-26 营销助理
2007-11-11 工程经理
2007-12-05 高级工具设计师
2007-12-11 工具设计师
2007-12-20 营销经理
2007-12-26 生产主管-WC60
您可以在以下地址下载Adventureworks2016
https://www.microsoft.com/en-us/download/details.aspx?id=49502
An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb
一个使用 PostgreSQL 的例子证明了这个概念
创建一个 table 包含 100 万条记录,随机日期在从今天到 90 日的一天范围内,并按日期索引
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
按日期分组,按日期截断,select 按日期限制在两天范围内
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
使用更大的限制日期范围,它选择应用 SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
只要在最后加上ORDER BY 1
即可(没有显着差异)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3
这取决于数据库供应商。
例如 PostgreSQL 不会自动对分组结果进行排序。
这里必须使用order by来对数据进行排序。
但 Sybase 和 Microsoft SQL 服务器可以。这里可以使用order by来更改默认排序。
"group by" 子句是否自动保证结果将按该键排序?换句话说,这样写就够了吗:
select *
from table
group by a, b, c
还是必须写
select *
from table
group by a, b, c
order by a, b, c
我知道,例如在 MySQL 中,我不必这样做,但我想知道我是否可以在 SQL 实现中依赖它。有保障吗?
group by
不一定对数据进行排序。数据库旨在尽可能快地获取数据,并且仅在必要时才进行排序。
因此,如果您需要保证订单,请添加 order by
。
绝对不是。我有过这样的经历,一旦我的一个查询突然开始 return 未排序的结果,随着 table 中的数据增长。
这取决于记录的数量。当记录较少时,Group by自动排序。当记录较多(超过15条)时需要添加Order by clause
我试过了。 Msdn 的 Adventureworks 数据库。
select HireDate, min(JobTitle)
from AdventureWorks2016CTP3.HumanResources.Employee
group by HireDate
结果:
2009-01-10生产技术员-WC40
2009-01-11应用专家
2009-01-12财务总监助理
2009-01-13生产技术员-WC50<
它returns 对 hiredate 的数据进行了排序,但在任何情况下都不依赖 GROUP BY 进行排序。
例如;索引可以更改此排序数据。
我添加了以下索引(hiredate, jobtitle)
CREATE NONCLUSTERED INDEX NonClusturedIndex_Jobtitle_hireddate ON [HumanResources].[Employee]
(
[JobTitle] ASC,
[HireDate] ASC
)
结果将随着相同的 select 查询而改变;
2006-06-30 生产技术员-WC60
2007-01-26 营销助理
2007-11-11 工程经理
2007-12-05 高级工具设计师
2007-12-11 工具设计师
2007-12-20 营销经理
2007-12-26 生产主管-WC60
您可以在以下地址下载Adventureworks2016
https://www.microsoft.com/en-us/download/details.aspx?id=49502
An efficient implementation of group by would perform the group-ing by sorting the data internally. That's why some RDBMS return sorted output when group-ing. Yet, the SQL specs don't mandate that behavior, so unless explicitly documented by the RDBMS vendor I wouldn't bet on it to work (tomorrow). OTOH, if the RDBMS implicitly does a sort it might also be smart enough to then optimize (away) the redundant order by. @jimmyb
一个使用 PostgreSQL 的例子证明了这个概念
创建一个 table 包含 100 万条记录,随机日期在从今天到 90 日的一天范围内,并按日期索引
CREATE TABLE WITHDRAW AS
SELECT (random()*1000000)::integer AS IDT_WITHDRAW,
md5(random()::text) AS NAM_PERSON,
(NOW() - ( random() * (NOW() + '90 days' - NOW()) ))::timestamp AS DAT_CREATION, -- de hoje a 90 dias atras
(random() * 1000)::decimal(12, 2) AS NUM_VALUE
FROM generate_series(1,1000000);
CREATE INDEX WITHDRAW_DAT_CREATION ON WITHDRAW(DAT_CREATION);
按日期分组,按日期截断,select 按日期限制在两天范围内
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '2 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
HashAggregate (cost=11428.33..11594.13 rows=11053 width=48)
Group Key: date_trunc('DAY'::text, dat_creation)
-> Bitmap Heap Scan on withdraw w (cost=237.73..11345.44 rows=11053 width=14)
Recheck Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
-> Bitmap Index Scan on withdraw_dat_creation (cost=0.00..234.97 rows=11053 width=0)
Index Cond: ((dat_creation >= ((now() - '2 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
使用更大的限制日期范围,它选择应用 SORT
EXPLAIN
SELECT
DATE_TRUNC('DAY', W.dat_creation), COUNT(1), SUM(W.NUM_VALUE)
FROM WITHDRAW W
WHERE W.dat_creation >= (NOW() - INTERVAL '60 DAY')::timestamp
AND W.dat_creation < (NOW() - INTERVAL '1 DAY')::timestamp
GROUP BY 1
GroupAggregate (cost=116522.65..132918.32 rows=655827 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.65..118162.22 rows=655827 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.57 rows=655827 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
只要在最后加上ORDER BY 1
即可(没有显着差异)
GroupAggregate (cost=116522.44..132918.06 rows=655825 width=48)
Group Key: (date_trunc('DAY'::text, dat_creation))
-> Sort (cost=116522.44..118162.00 rows=655825 width=14)
Sort Key: (date_trunc('DAY'::text, dat_creation))
-> Seq Scan on withdraw w (cost=0.00..41949.56 rows=655825 width=14)
Filter: ((dat_creation >= ((now() - '60 days'::interval))::timestamp without time zone) AND (dat_creation < ((now() - '1 day'::interval))::timestamp without time zone))
PostgreSQL 10.3
这取决于数据库供应商。
例如 PostgreSQL 不会自动对分组结果进行排序。 这里必须使用order by来对数据进行排序。
但 Sybase 和 Microsoft SQL 服务器可以。这里可以使用order by来更改默认排序。