MySQL 按组对记录进行编号 - 我遇到错误了吗?
MySQL numbering records by group - did I hit a bug?
我正在尝试对 MySQL(Ubuntu 上的 5.5.44-0)中的一些记录进行编号,按另一列分组(您将在下面明白我的意思)。我正在调整 Running Sums for Multiple Categories in MySQL 中描述的解决方案,除了我只是编号,而不是求和。
涉及的表比较大,有将近100列,所以我们先简化演示,创建只包含重要列的派生表。抱歉没有分享 SQL Fiddle,因为它看起来不像是可复制的,除非处理大量数据,我无法分享:
正在创建表格:
CREATE TABLE `inquiries_test` (
`id` int(11) NOT NULL DEFAULT '0',
`motive` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`inquiry_id`),
KEY `motive` (`motive`)
);
insert into inquires_test select id, motive from inquiries;
CREATE TABLE `leads_test` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`inquiry_id` int(11) DEFAULT NULL,
KEY `id` (`lead_id`)
);
insert into leads_test select lead_id, created_at, inquiry_id;
CREATE TABLE `lead_inserts` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`cnt` int(11) DEFAULT NULL
);
你会注意到上面inquiries_test和leads_test的数据来自实际生产表。其重要性将在稍后发挥作用。现在填充 lead_inserts:
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads_test l join inquiries_test i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (14.30 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 2 |
| 117 | 2015-01-27 22:34:41 | 3 |
| 117 | 2015-03-19 19:33:51 | 4 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 2 |
| 118 | 2015-04-07 21:33:43 | 3 |
| 118 | 2015-04-10 17:00:04 | 4 |
| 118 | 2015-05-12 21:59:49 | 5 |
+---------+---------------------+------+
到目前为止一切顺利 - 每个新 lead_id 的 cnt "resets" 值。现在考虑到 leads_test 和 inquiries_tests 基本上是删除其他列的线索和查询,可以预期如果我修改插入语句以使用原始表,结果应该是相同的, 正确的?但是你看:
playground>truncate table lead_inserts;
Query OK, 0 rows affected (0.14 sec)
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads l join inquiries i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (17.25 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 1 |
| 117 | 2015-01-27 22:34:41 | 1 |
| 117 | 2015-03-19 19:33:51 | 1 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 1 |
| 118 | 2015-04-07 21:33:43 | 1 |
| 118 | 2015-04-10 17:00:04 | 1 |
| 118 | 2015-05-12 21:59:49 | 1 |
+---------+---------------------+------+
编号怎么了?使用原始表格时的其他观察结果:
- 如果我不处理所有记录并仅指定少数 lead_id,则计算结果正确。
- 如果我删除 INSERT 子句并将其 运行 作为 select(使用 LIMIT 子句仅显示 50 行输出),计算结果正确。
那么,这是我遇到的错误,还是我遗漏了什么?在现实生活中,我不能使用上述过程作为解决方法 - 我真的必须使用潜在客户和查询,因为这些表中的其他列必须是 lead_inserts.
的一部分
谢谢!
A Cha 指出,这看起来像是 MySQL 优化,其中 MySQL 发现没有理由执行 ORDER BY,因为最终结果只会被插入到新的 [=21] =].为什么它适用于测试 tables 而不适用于生产,当它们具有相同的行数时,我不知道。但这就是我强制它对将要插入的内容进行排序的方式:
首先确保我将排序的列有一个串联索引:
CREATE INDEX idx_leads_lead_id_created ON leads(lead_id, created_at);
然后强制MySQL使用这个索引:
insert into lead_inserts (cnt, created_at, lead_id)
SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
, l.created_at
@id := l.lead_id as local_resouce_id
FROM leads l FORCE INDEX FOR ORDER BY (idx_leads_lead_id_created)
JOIN inquiries i on (l.inquiry_id=i.id)
CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
WHERE i.motive='real'
ORDER BY lead_id, created_at;
我正在尝试对 MySQL(Ubuntu 上的 5.5.44-0)中的一些记录进行编号,按另一列分组(您将在下面明白我的意思)。我正在调整 Running Sums for Multiple Categories in MySQL 中描述的解决方案,除了我只是编号,而不是求和。
涉及的表比较大,有将近100列,所以我们先简化演示,创建只包含重要列的派生表。抱歉没有分享 SQL Fiddle,因为它看起来不像是可复制的,除非处理大量数据,我无法分享:
正在创建表格:
CREATE TABLE `inquiries_test` (
`id` int(11) NOT NULL DEFAULT '0',
`motive` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
PRIMARY KEY (`inquiry_id`),
KEY `motive` (`motive`)
);
insert into inquires_test select id, motive from inquiries;
CREATE TABLE `leads_test` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`inquiry_id` int(11) DEFAULT NULL,
KEY `id` (`lead_id`)
);
insert into leads_test select lead_id, created_at, inquiry_id;
CREATE TABLE `lead_inserts` (
`lead_id` int(11) DEFAULT NULL,
`created_at` datetime DEFAULT NULL,
`cnt` int(11) DEFAULT NULL
);
你会注意到上面inquiries_test和leads_test的数据来自实际生产表。其重要性将在稍后发挥作用。现在填充 lead_inserts:
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads_test l join inquiries_test i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (14.30 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 2 |
| 117 | 2015-01-27 22:34:41 | 3 |
| 117 | 2015-03-19 19:33:51 | 4 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 2 |
| 118 | 2015-04-07 21:33:43 | 3 |
| 118 | 2015-04-10 17:00:04 | 4 |
| 118 | 2015-05-12 21:59:49 | 5 |
+---------+---------------------+------+
到目前为止一切顺利 - 每个新 lead_id 的 cnt "resets" 值。现在考虑到 leads_test 和 inquiries_tests 基本上是删除其他列的线索和查询,可以预期如果我修改插入语句以使用原始表,结果应该是相同的, 正确的?但是你看:
playground>truncate table lead_inserts;
Query OK, 0 rows affected (0.14 sec)
playground>insert into lead_inserts (cnt, created_at, lead_id)
-> SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
-> , l.created_at
-> , @id := l.lead_id as local_resouce_id
-> FROM leads l join inquiries i on (l.inquiry_id=i.id)
-> CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
-> where i.motive='real' ORDER BY lead_id, created_at;
Query OK, 2172774 rows affected (17.25 sec)
Records: 2172774 Duplicates: 0 Warnings: 0
playground>select * from lead_inserts where lead_id in (117,118);
+---------+---------------------+------+
| lead_id | created_at | cnt |
+---------+---------------------+------+
| 117 | 2012-06-23 00:13:09 | 1 |
| 117 | 2014-09-14 04:30:37 | 1 |
| 117 | 2015-01-27 22:34:41 | 1 |
| 117 | 2015-03-19 19:33:51 | 1 |
| 118 | 2014-12-24 17:47:15 | 1 |
| 118 | 2015-01-23 21:30:09 | 1 |
| 118 | 2015-04-07 21:33:43 | 1 |
| 118 | 2015-04-10 17:00:04 | 1 |
| 118 | 2015-05-12 21:59:49 | 1 |
+---------+---------------------+------+
编号怎么了?使用原始表格时的其他观察结果:
- 如果我不处理所有记录并仅指定少数 lead_id,则计算结果正确。
- 如果我删除 INSERT 子句并将其 运行 作为 select(使用 LIMIT 子句仅显示 50 行输出),计算结果正确。
那么,这是我遇到的错误,还是我遗漏了什么?在现实生活中,我不能使用上述过程作为解决方法 - 我真的必须使用潜在客户和查询,因为这些表中的其他列必须是 lead_inserts.
的一部分谢谢!
A Cha 指出,这看起来像是 MySQL 优化,其中 MySQL 发现没有理由执行 ORDER BY,因为最终结果只会被插入到新的 [=21] =].为什么它适用于测试 tables 而不适用于生产,当它们具有相同的行数时,我不知道。但这就是我强制它对将要插入的内容进行排序的方式:
首先确保我将排序的列有一个串联索引:
CREATE INDEX idx_leads_lead_id_created ON leads(lead_id, created_at);
然后强制MySQL使用这个索引:
insert into lead_inserts (cnt, created_at, lead_id)
SELECT @cnt := if(@id = l.lead_id,@cnt,0) + 1 as cnt
, l.created_at
@id := l.lead_id as local_resouce_id
FROM leads l FORCE INDEX FOR ORDER BY (idx_leads_lead_id_created)
JOIN inquiries i on (l.inquiry_id=i.id)
CROSS JOIN (select @id := 0, @cnt := 0) as InitVarsAlias
WHERE i.motive='real'
ORDER BY lead_id, created_at;