MySQL 8 Window 功能 + 全文搜索
MySQL 8 Window Functions + Full-text searching
我在 x86_64(MySQL 社区服务器(GPL ))
在列 name
上创建 table 和全文索引
CREATE TABLE `title` (
`id` smallint(4) unsigned NOT NULL PRIMARY KEY,
`name` text COLLATE utf8_unicode_ci,
FULLTEXT idx (name) WITH PARSER ngram
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
插入一些数据:
insert into `title` values(14,"I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home).");
insert into `title` values(23,"I've never been to the area.");
insert into `title` values(43,"Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?");
insert into `title` values(125,"Don't really have much planned other than the Falls and the game.");
执行时:
select
id,
round(MATCH (name) AGAINST ('other than the'),2) scope
from title;
结果(一切正常):
id | scope
----------
14 | 0.43
23 | 0.23
43 | 0.12
125 | 1.15
使用经典 GROUP BY - 一切正常
select
max(scope),
min(scope),
sum(scope)
from
(
select id, round(MATCH (name) AGAINST ('other than the'),2) scope
from title
) a;
结果正常:
max | min | sum
----------------
1.15 | 0.12 | 1.96
但是当我尝试使用 window 函数时 over 我不明白结果:
select
id,
max(scope) over(),
min(scope) over(),
sum(scope) over()
from
(
select id, round(MATCH (name) AGAINST ('other than the'),2) scope
from title
) a;
我得到一个奇怪的结果(为什么?):
id | max | min | sum
------------------------
14 | 1.15 | 1.15 | 4.60
23 | 1.15 | 1.15 | 4.60
43 | 1.15 | 1.15 | 4.60
125| 1.15 | 1.15 | 4.60
我希望得到类似于经典分组的结果,例如:
id | max | min | sum
------------------------
14 | 1.15 | 0.12 | 1.96
23 | 1.15 | 0.12 | 1.96
43 | 1.15 | 0.12 | 1.96
125| 1.15 | 0.12 | 1.96
这是 mysql Ver 8.0.3-rc 中的错误还是我的查询不正确?
谢谢!
您似乎在 MySQL 中发现了错误,报告错误:bugs.mysql.com。
我在 MySQL 和 MariaDB 中执行了以下脚本(没有 WITH PARSER ngram
因为目前 MariaDB 不支持它,请参阅 Add "ngram" support to MariaDB),结果:
MySQL:
mysql> SELECT VERSION();
+--------------+
| VERSION() |
+--------------+
| 8.0.3-rc-log |
+--------------+
1 row in set (0.00 sec)
mysql> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE TABLE `title` (
-> `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
-> `name` TEXT COLLATE utf8_unicode_ci,
-> FULLTEXT idx (`name`) -- WITH PARSER ngram
-> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO `title`
-> VALUES
-> (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
-> (23, "I've never been to the area."),
-> (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
-> (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT
-> MAX(`scope`),
-> MIN(`scope`),
-> SUM(`scope`)
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
| 0.72 | 0.00 | 0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER(),
-> MIN(`scope`) OVER(),
-> SUM(`scope`) OVER()
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+---------------------+---------------------+---------------------+
| id | MAX(`scope`) OVER() | MIN(`scope`) OVER() | SUM(`scope`) OVER() |
+-----+---------------------+---------------------+---------------------+
| 14 | 0.72 | 0.72 | 2.88 |
| 23 | 0.72 | 0.72 | 2.88 |
| 43 | 0.72 | 0.72 | 2.88 |
| 125 | 0.72 | 0.72 | 2.88 |
+-----+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)
MariaDB:
MariaDB[_]> SELECT VERSION();
+----------------------------------------+
| VERSION() |
+----------------------------------------+
| 10.2.6-MariaDB-10.2.6+maria~jessie-log |
+----------------------------------------+
1 row in set (0.00 sec)
MariaDB[_]> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)
MariaDB[_]> CREATE TABLE `title` (
-> `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
-> `name` TEXT COLLATE utf8_unicode_ci,
-> FULLTEXT idx (`name`) -- WITH PARSER ngram
-> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)
MariaDB[_]> INSERT INTO `title`
-> VALUES
-> (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
-> (23, "I've never been to the area."),
-> (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
-> (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB[_]> SELECT
-> MAX(`scope`),
-> MIN(`scope`),
-> SUM(`scope`)
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
| 0.72 | 0.00 | 0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)
MariaDB[_]> SELECT
-> `id`,
-> MAX(`scope`) OVER(),
-> MIN(`scope`) OVER(),
-> SUM(`scope`) OVER()
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+--------------+--------------+--------------+
| id | MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+-----+--------------+--------------+--------------+
| 14 | 0.72 | 0.00 | 0.72 |
| 23 | 0.72 | 0.00 | 0.72 |
| 43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+--------------+--------------+--------------+
4 rows in set (0.00 sec)
关于 wchiquito 的回答:你是对的,有一个错误。自发布以来已修复。修复后,MySQL returns 这个对 windowing 查询的回答:
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER() `max`,
-> MIN(`scope`) OVER() `min`,
-> SUM(`scope`) OVER() `sum`
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+------+------+------+
| id | max | min | sum |
+-----+------+------+------+
| 14 | 0.72 | 0.00 | 0.72 |
| 23 | 0.72 | 0.00 | 0.72 |
| 43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+------+------+------+
4 rows in set (0,01 sec)
这和你引用Maria的话还是有区别的;但我相信
上面的 MySQL 答案是正确的:因为 window 规范是空的,所以 window 函数应该作用于每一行的结果集中的所有行,即相同的值应该每个结果集行上 window 函数调用的结果。
如果您按照与 GROUP BY 查询类似的方式对结果集进行分区(请参阅下面的 PARTITION BY a.id),您将看到以下结果:
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER(PARTITION BY a.id) `max`,
-> MIN(`scope`) OVER(PARTITION BY a.id) `min`,
-> SUM(`scope`) OVER(PARTITION BY a.id) `sum`
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+------+------+------+
| id | max | min | sum |
+-----+------+------+------+
| 14 | 0.00 | 0.00 | 0.00 |
| 23 | 0.00 | 0.00 | 0.00 |
| 43 | 0.00 | 0.00 | 0.00 |
| 125 | 0.72 | 0.72 | 0.72 |
+-----+------+------+------+
4 rows in set (0,00 sec)
因为每一行在这里都是它自己的分区。这与您为没有 PARTITION BY.
的 Maria 引用的相同
我在 x86_64(MySQL 社区服务器(GPL ))
在列 name
上创建 table 和全文索引CREATE TABLE `title` (
`id` smallint(4) unsigned NOT NULL PRIMARY KEY,
`name` text COLLATE utf8_unicode_ci,
FULLTEXT idx (name) WITH PARSER ngram
) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
插入一些数据:
insert into `title` values(14,"I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home).");
insert into `title` values(23,"I've never been to the area.");
insert into `title` values(43,"Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?");
insert into `title` values(125,"Don't really have much planned other than the Falls and the game.");
执行时:
select
id,
round(MATCH (name) AGAINST ('other than the'),2) scope
from title;
结果(一切正常):
id | scope
----------
14 | 0.43
23 | 0.23
43 | 0.12
125 | 1.15
使用经典 GROUP BY - 一切正常
select
max(scope),
min(scope),
sum(scope)
from
(
select id, round(MATCH (name) AGAINST ('other than the'),2) scope
from title
) a;
结果正常:
max | min | sum
----------------
1.15 | 0.12 | 1.96
但是当我尝试使用 window 函数时 over 我不明白结果:
select
id,
max(scope) over(),
min(scope) over(),
sum(scope) over()
from
(
select id, round(MATCH (name) AGAINST ('other than the'),2) scope
from title
) a;
我得到一个奇怪的结果(为什么?):
id | max | min | sum
------------------------
14 | 1.15 | 1.15 | 4.60
23 | 1.15 | 1.15 | 4.60
43 | 1.15 | 1.15 | 4.60
125| 1.15 | 1.15 | 4.60
我希望得到类似于经典分组的结果,例如:
id | max | min | sum
------------------------
14 | 1.15 | 0.12 | 1.96
23 | 1.15 | 0.12 | 1.96
43 | 1.15 | 0.12 | 1.96
125| 1.15 | 0.12 | 1.96
这是 mysql Ver 8.0.3-rc 中的错误还是我的查询不正确? 谢谢!
您似乎在 MySQL 中发现了错误,报告错误:bugs.mysql.com。
我在 MySQL 和 MariaDB 中执行了以下脚本(没有 WITH PARSER ngram
因为目前 MariaDB 不支持它,请参阅 Add "ngram" support to MariaDB),结果:
MySQL:
mysql> SELECT VERSION();
+--------------+
| VERSION() |
+--------------+
| 8.0.3-rc-log |
+--------------+
1 row in set (0.00 sec)
mysql> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)
mysql> CREATE TABLE `title` (
-> `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
-> `name` TEXT COLLATE utf8_unicode_ci,
-> FULLTEXT idx (`name`) -- WITH PARSER ngram
-> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)
mysql> INSERT INTO `title`
-> VALUES
-> (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
-> (23, "I've never been to the area."),
-> (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
-> (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
mysql> SELECT
-> MAX(`scope`),
-> MIN(`scope`),
-> SUM(`scope`)
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
| 0.72 | 0.00 | 0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER(),
-> MIN(`scope`) OVER(),
-> SUM(`scope`) OVER()
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+---------------------+---------------------+---------------------+
| id | MAX(`scope`) OVER() | MIN(`scope`) OVER() | SUM(`scope`) OVER() |
+-----+---------------------+---------------------+---------------------+
| 14 | 0.72 | 0.72 | 2.88 |
| 23 | 0.72 | 0.72 | 2.88 |
| 43 | 0.72 | 0.72 | 2.88 |
| 125 | 0.72 | 0.72 | 2.88 |
+-----+---------------------+---------------------+---------------------+
4 rows in set (0.00 sec)
MariaDB:
MariaDB[_]> SELECT VERSION();
+----------------------------------------+
| VERSION() |
+----------------------------------------+
| 10.2.6-MariaDB-10.2.6+maria~jessie-log |
+----------------------------------------+
1 row in set (0.00 sec)
MariaDB[_]> DROP TABLE IF EXISTS `title`;
Query OK, 0 rows affected (0.02 sec)
MariaDB[_]> CREATE TABLE `title` (
-> `id` SMALLINT UNSIGNED NOT NULL PRIMARY KEY,
-> `name` TEXT COLLATE utf8_unicode_ci,
-> FULLTEXT idx (`name`) -- WITH PARSER ngram
-> ) DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Query OK, 0 rows affected (0.01 sec)
MariaDB[_]> INSERT INTO `title`
-> VALUES
-> (14, "I'm flying in for the game (one night in Niagara Falls, NY and one night in Buffalo then back home)."),
-> (23, "I've never been to the area."),
-> (43, "Where and what must I eat (Canadian side of Niagara, American side and Buffalo)?"),
-> (125, "Don't really have much planned other than the Falls and the game.");
Query OK, 4 rows affected (0.00 sec)
Records: 4 Duplicates: 0 Warnings: 0
MariaDB[_]> SELECT
-> MAX(`scope`),
-> MIN(`scope`),
-> SUM(`scope`)
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+--------------+--------------+--------------+
| MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+--------------+--------------+--------------+
| 0.72 | 0.00 | 0.72 |
+--------------+--------------+--------------+
1 row in set (0.00 sec)
MariaDB[_]> SELECT
-> `id`,
-> MAX(`scope`) OVER(),
-> MIN(`scope`) OVER(),
-> SUM(`scope`) OVER()
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+--------------+--------------+--------------+
| id | MAX(`scope`) | MIN(`scope`) | SUM(`scope`) |
+-----+--------------+--------------+--------------+
| 14 | 0.72 | 0.00 | 0.72 |
| 23 | 0.72 | 0.00 | 0.72 |
| 43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+--------------+--------------+--------------+
4 rows in set (0.00 sec)
关于 wchiquito 的回答:你是对的,有一个错误。自发布以来已修复。修复后,MySQL returns 这个对 windowing 查询的回答:
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER() `max`,
-> MIN(`scope`) OVER() `min`,
-> SUM(`scope`) OVER() `sum`
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+------+------+------+
| id | max | min | sum |
+-----+------+------+------+
| 14 | 0.72 | 0.00 | 0.72 |
| 23 | 0.72 | 0.00 | 0.72 |
| 43 | 0.72 | 0.00 | 0.72 |
| 125 | 0.72 | 0.00 | 0.72 |
+-----+------+------+------+
4 rows in set (0,01 sec)
这和你引用Maria的话还是有区别的;但我相信 上面的 MySQL 答案是正确的:因为 window 规范是空的,所以 window 函数应该作用于每一行的结果集中的所有行,即相同的值应该每个结果集行上 window 函数调用的结果。
如果您按照与 GROUP BY 查询类似的方式对结果集进行分区(请参阅下面的 PARTITION BY a.id),您将看到以下结果:
mysql> SELECT
-> `id`,
-> MAX(`scope`) OVER(PARTITION BY a.id) `max`,
-> MIN(`scope`) OVER(PARTITION BY a.id) `min`,
-> SUM(`scope`) OVER(PARTITION BY a.id) `sum`
-> FROM
-> (
-> SELECT
-> `id`,
-> ROUND(MATCH (`name`) AGAINST ('other than the'), 2) `scope`
-> FROM `title`
-> ) `a`;
+-----+------+------+------+
| id | max | min | sum |
+-----+------+------+------+
| 14 | 0.00 | 0.00 | 0.00 |
| 23 | 0.00 | 0.00 | 0.00 |
| 43 | 0.00 | 0.00 | 0.00 |
| 125 | 0.72 | 0.72 | 0.72 |
+-----+------+------+------+
4 rows in set (0,00 sec)
因为每一行在这里都是它自己的分区。这与您为没有 PARTITION BY.
的 Maria 引用的相同