Mysql 查询以搜索带有标题、内容和标签的帖子

Mysql query to search posts with title, content and tag

要求

我想在一次查询中完成。 如何高效查询?

以下是我使用 REGEXP 和 FULLTEXT 进行的查询。

正则表达式

(SELECT * FROM board
    WHERE
        title rlike 'first' AND title rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        content rlike 'first' AND content rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        tag rlike 'first' AND tag rlike 'second'
    ORDER BY board_id DESC LIMIT 1000)
LIMIT 1000;

全文

(SELECT * FROM board
    WHERE
        match(title) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        match(content) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
UNION
(SELECT * FROM board
    WHERE
        match(tag) AGAINST('+"first" +"second"' in boolean mode)
    ORDER BY board_id DESC LIMIT 1000)
LIMIT 1000;

就像我现在一样,REGEXP 不使用索引,但它比全文快。 我也不明白为什么会这样。


创建语句

CREATE TABLE `board` (
  `board_id` bigint NOT NULL AUTO_INCREMENT,
  `user_id` bigint NOT NULL,
  `nickname` varchar(255) NOT NULL,
  `category` int NOT NULL,
  `title` varchar(255) NOT NULL,
  `content` text NOT NULL,
  `likes` int NOT NULL DEFAULT '0',
  `hits` int NOT NULL DEFAULT '0',
  `tag` varchar(255) DEFAULT NULL,
  `create_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `modify_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`board_id`),
  KEY `popular` (`create_date`,`likes`),
  FULLTEXT KEY `fttitle` (`title`),
  FULLTEXT KEY `ftcontent` (`content`),
  FULLTEXT KEY `fttag` (`tag`)
) ENGINE=InnoDB AUTO_INCREMENT=60027 DEFAULT CHARSET=utf8;

您现有的 FULLTEXT 查询建议在三个基于文本的列上使用单独的全文索引。在此基础上,我可能会从 -

开始
(
    SELECT *, 1 AS col_sort
    FROM board
    WHERE match(title) AGAINST('+"first" +"second"' in boolean mode)
) UNION (
    SELECT *, 2 AS col_sort
    FROM board
    WHERE match(content) AGAINST('+"first" +"second"' in boolean mode)
) UNION (
    SELECT *, 3 AS col_sort
    FROM board
    WHERE match(tag) AGAINST('+"first" +"second"' in boolean mode)
)
ORDER BY col_sort ASC;

使用 RLIKE 和 FULLTEXT 搜索的性能差异取决于数据集的大小,包括列大小和行数。

regexp 和 Union 不会有任何特定的顺序。

在大多数公式中,最好先获取联合中的 ID,然后 JOIN 以从结果的几行中获取其余列。

请注意,UNION 表示 UNION DISTINCT 并且具有重复数据删除通道。 (这可能是你想要的,即使它比 UNION ALL 慢。)

您所展示的大多数 ORDER BY 都是无用的——除了(也许)浪费时间外,它们什么也做不了。 ORDER BYLIMIT 需要为每个子查询配对。然后在完成 UNION 之后再次进行。相关主题:http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

nnichols 的回答没有去重。这可以通过一个额外的(外部)Select 来补救,它在按

进行排序之前将答案分组并选择 MIN(col_sort)GROUP BY id

更多

SELECT c.*
    FROM (
        SELECT id, MIN(col_sort) AS col_sort2
            FROM (
                SELECT id, 1 AS col_sort
                    FROM ... 
                    WHERE MATCH ...
                    LIMIT 100
                UNION ALL
                SELECT id, 2 AS col_sort
                    FROM ... 
                    WHERE MATCH ...
                    LIMIT 100
                UNION ALL
                SELECT id, 3 AS col_sort
                    FROM ... 
                    WHERE MATCH ...
                    LIMIT 100
                 ) AS a
            GROUP BY id 
         ) AS b
    JOIN board AS c  USING(id)
    ORDER BY b.col_sort2
    LIMIT 100;

不要使用像 1000 这样的大数字作为 LIMIT。通过查询维护 'relevance' 确实变得相当混乱。