MySQL 中的布尔全文搜索中的“~”(波浪号)运算符的行为与 MySQL 开发者网站中所述的不同

`~` (tilde) operator in Boolean Full-Text Search in MySQL is not behaving as stated in MySQL developer website

我创建了以下 table fruits -

CREATE TABLE `fruits` (
  `id` tinyint unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(200) NOT NULL,
  PRIMARY KEY (`id`),
  FULLTEXT KEY `ft_name` (`name`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci

然后我在table fruits -

中输入了以下值
SELECT * FROM fruits;
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  3 | mango, kiwi   |
|  4 | mango, guava  |
|  5 | apple, banana |
+----+---------------+

现在我运行以下三个SQL查询-

查询 1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

查询 2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple -orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

查询 3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~orange' IN BOOLEAN MODE);
+----+---------------+
| id | name          |
+----+---------------+
|  1 | apple, orange |
|  2 | apple, mango  |
|  5 | apple, banana |
+----+---------------+

根据 MySQL 开发者网站,以下是 'Boolean Full-Text Searches'

~(波浪号)运算符的功能

https://dev.mysql.com/doc/refman/8.0/en/fulltext-boolean.html

  • '+apple ~macintosh'

Find rows that contain the word “apple”, but if the row also contains the word “macintosh”, rate it lower than if row does not. This is “softer” than a search for '+apple -macintosh', for which the presence of “macintosh” causes the row not to be returned at all.

我已经在 'Query 3' 中尝试了 ~ (代字号)运算符,但输出肯定不是预期的。在这里,预期的行为是 id = 1 排在最后。

P.S。 我正在使用 MySQL 版本 - 8.0.26-0ubuntu0.20.04.2 for Linux on x86_64 ((Ubuntu))

虽然在任何地方都没有关于我的答案的文档,但经过彻底的实验,我得出了这个最合乎逻辑的结论 -

“+”运算符的存在使“~”运算符的任何效果无效

我已经用以下值更新了我的 table fruits -

SELECT * FROM fruits;
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
+----+-------------------------+

查询 1:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  1 | apple orange watermelon |
|  5 | mango guava watermelon  |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

查询 2:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询 3:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

此处,在查询 1、2 和 3 中,值 applemango 之前没有运算符,而 ~ 运算符在值 pomegranate 之前。这确保具有单词 pomegranate 的行排名低于其他行。

查询 4:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
+----+-------------------------+

查询 5:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
+----+-------------------------+

查询 6:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango ~pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  2 | apple mango pomegranate |
|  3 | apple mango banana      |
|  4 | mango kiwi pomegranate  |
|  5 | mango guava watermelon  |
+----+-------------------------+

此处,在查询 4、5 和 6 中,+ 运算符在值 applemango 之前,而 ~ 运算符在值 pomegranate 之前.显然 + 运算符的存在使 ~ 运算符的任何效果无效。

查询 7:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询 8:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+apple <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  1 | apple orange watermelon |
|  3 | apple mango banana      |
|  6 | apple banana kiwi       |
|  2 | apple mango pomegranate |
+----+-------------------------+

查询 9:

SELECT id, name FROM fruits
    -> WHERE MATCH(name) AGAINST
    -> ('+mango <pomegranate'
    -> IN BOOLEAN MODE);
+----+-------------------------+
| id | name                    |
+----+-------------------------+
|  3 | apple mango banana      |
|  5 | mango guava watermelon  |
|  2 | apple mango pomegranate |
|  4 | mango kiwi pomegranate  |
+----+-------------------------+

此处,在查询 7、8 和 9 中,+ 运算符在值 applemango 之前,而 < 运算符在值 pomegranate 之前.这确保具有单词 pomegranate 的行排名低于其他行。

因此,从这里可以推断出—— 如果存在 + 运算符,请使用 < 运算符而不是 ~ 运算符


更新

经过大量计算,我创建了 table fruits_score_count,它在完成布尔全文搜索时显示每个 fruitscore

SELECT * FROM fruits_score_count;
+----+-------------+---------------------+----------------------+
| id | fruit_name  | row_numbers_matched | score                |
+----+-------------+---------------------+----------------------+
|  1 | apple       |                   4 | 0.031008131802082062 |
|  2 | banana      |                   2 |  0.22764469683170319 |
|  3 | guava       |                   1 |   0.6055193543434143 |
|  4 | kiwi        |                   2 |  0.22764469683170319 |
|  5 | mango       |                   4 | 0.031008131802082062 |
|  6 | orange      |                   1 |   0.6055193543434143 |
|  7 | pomegranate |                   2 |  0.22764469683170319 |
|  8 | watermelon  |                   2 |  0.22764469683170319 |
+----+-------------+---------------------+----------------------+

查询 1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  1 | apple orange watermelon | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询 2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询 3:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询 4:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.062016263604164124 |
|  3 | apple mango banana      | 0.062016263604164124 |
+----+-------------------------+----------------------+

查询 5:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
+----+-------------------------+----------------------+

查询 6:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  2 | apple mango pomegranate | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
+----+-------------------------+----------------------+

查询 7:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple +mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+

查询 8:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  1 | apple orange watermelon | 0.031008131802082062 |
|  3 | apple mango banana      | 0.031008131802082062 |
|  6 | apple banana kiwi       | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
+----+-------------------------+----------------------+

查询 9:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7413471937179565 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

此处,查询 1、查询 2、查询 3、查询 7、查询 8、查询 9 的行为符合预期。

但从查询 4、查询 5、查询 6 可以清楚地看出 -

在值前面有 + 运算符的情况下,~ 运算符基本上会使值不可见。

另外仔细观察发现-

x ~y+x <y等价


进一步实验

查询 1:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple ~pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  4 | mango kiwi pomegranate  | 0.031008131802082062 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
+----+-------------------------+----------------------+
  • 具有 id = 3 的第 1 行获得最高分数,它是 mangoapple 的分数之和。
  • 第 2 行 id = 4 获得第二大分数,即 mango 的分数。 mango 前面的 + 运算符使搜索词组的 ~pomegranate 变得无关紧要。
  • 第3行id = 5得到与第2行相同的分数。但它比第2行低,因为当分数相同时,行按primary key的升序排列,这里 idprimary key.
  • 第 4 行 id = 2 得分最低,因此排在最后。这里因为单词 apple 存在并且在搜索短语中没有 + 运算符在 apple 之前,因此考虑了搜索短语中的 ~pomegranate,这降低了分数显着。

查询 2:

SELECT id, name, score FROM
    -> (SELECT id, name, MATCH(name) AGAINST
    -> ('+mango apple <pomegranate' IN BOOLEAN MODE)
    -> AS score FROM fruits ORDER BY score DESC)
    -> AS temp WHERE score != 0;
+----+-------------------------+----------------------+
| id | name                    | score                |
+----+-------------------------+----------------------+
|  3 | apple mango banana      | 0.062016263604164124 |
|  5 | mango guava watermelon  | 0.031008131802082062 |
|  2 | apple mango pomegranate |  -0.7103390693664551 |
|  4 | mango kiwi pomegranate  |  -0.7413471937179565 |
+----+-------------------------+----------------------+

这再次说明<运算符即使在+运算符存在的情况下也有效。

这进一步强化了我之前的观察——

如果存在 + 运算符,请使用 < 运算符而不是 ~ 运算符