SQL 基于相关性的搜索的查询优化
SQL Query optimization for relevance based search
此查询 returns 行在搜索物种名称时按相关性排序。我将它用于自动完成建议列表并且相关性计算工作正常,但查询在大型 table 上有点慢,我很感谢有关如何优化它的任何提示(MySQL ).我的主要问题是:
我可以在 table 上创建有助于优化的任何类型的索引吗?还是我坚持使用这种明显使用文件排序算法的查询? (可能是蜂鸣有点慢的原因?)
编辑:我使用 InnoDB 作为 table 类型,所以不幸的是我不能在这种情况下使用全文索引(仅适用于 MyIsam tables)。
SQL-fiddle 这里:http://sqlfiddle.com/#!2/f03c4c/5
SELECT 查询:
SET @search ='Boletus a';
SELECT id, 属, 种, 全名,
(CASE WHEN (CONCAT(genus, ' ', species)=@search) THEN 1 ELSE 0 END) # 全名完全匹配
+ (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # 匹配全名的开头
+ (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT('%',@search,'%')) THEN 1 ELSE 0 END) # LIKE MATCH OF WHOLE NAME
+ (CASE WHEN (genus=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF 属
+ (CASE WHEN (species=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF species
+ (CASE WHEN (genus LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF genus
+ (CASE WHEN (species LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) #MATCH BEGINNING OF species
作为相关
来自物种
WHERE `fullname` LIKE CONCAT('%',@search,'%')
ORDER BY relevans DESC,属,种
限制 50;
背景:一个种名至少由属和加词两部分组成(在我的table中,加词栏被命名为“种”)。我在 table 中有三列:属、种和全名。 “全名”列还可以包含较低类群的名称(如 sqlfiddle 示例中的变种和形式)。我愿意就如何提高搜索效率提出任何建议。也许是搜索字符串上的正则表达式并仅针对“全名”列而不是连接两列?
数据库模式示例:
创建 TABLE 个物种
(`id` int, `genus` varchar(50), `species` varchar(50), `fullname` varchar(100))
;
插入物种
(`id`, `genus`, `species`, `fullname`)
价值观
(360052, 'Afroboletus', 'azureotinctus', 'Afroboletus azureotinctus'),
(360053, 'Afroboletus', 'costatisporus', 'Afroboletus costatisporus'),
(464267, 'Afroboletus', 'elegans', 'Afroboletus elegans'),
(360054, 'Afroboletus', 'lepidellus', 'Afroboletus lepidellus'),
(112100, 'Afroboletus', 'luteolus', 'Afroboletus luteolus'),
(464266, 'Afroboletus', 'multijugus', 'Afroboletus multijugus'),
(112101, 'Afroboletus', 'pterosporus', 'Afroboletus pterosporus'),
(326826, 'Aureoboletus', 'auriporus', 'Aureoboletus auriporus'),
(326828, 'Aureoboletus', 'gentilis', 'Aureoboletus gentilis'),
(309389, 'Aureoboletus', 'novoguineensis', 'Aureoboletus novoguineensis'),
(326829, 'Aureoboletus', 'subacidus', 'Aureoboletus subacidus'),
(113146, 'Aureoboletus', 'thibetanus', 'Aureoboletus thibetanus'),
(118425, 'Austroboletus', 'cookei', 'Austroboletus cookei'),
(118427, 'Austroboletus', 'dictyotus', 'Austroboletus dictyotus'),
(412550, 'Austroboletus', 'lacunosus', 'Austroboletus lacunosus'),
(159051, 'Boletus', 'aereus', 'Boletus aereus'),
(171640, 'Boletus', 'appendiculatus', 'Boletus appendiculatus'),
(161237, 'Boletus', 'armeniacus', 'Boletus armeniacus'),
(563944, 'Boletus', 'australiensis', 'Boletus australiensis'),
(444094, 'Boletus', 'badius', 'Boletus badius'),
(215376, 'Boletus', 'brunneus', 'Boletus brunneus'),
(129701, 'Boletus', 'bubalinus', 'Boletus bubalinus'),
(203954, 'Boletus', 'byssinus', 'Boletus byssinus'),
(162779, 'Boletus', 'calopus', 'Boletus calopus'),
(129469, 'Boletus', 'caucasicus', 'Boletus caucasicus'),
(208740, 'Boletus', 'chrysenteron', 'Boletus chrysenteron'),
(486540, 'Boletus', 'cisalpinus', 'Boletus cisalpinus'),
(368037, 'Boletus', 'declivitatum', 'Boletus declivitatum'),
(104061, 'Boletus', 'depilatus', 'Boletus depilatus'),
(356530, 'Boletus', 'edulis', 'Boletus edulis'),
(356278, 'Boletus', 'erythropus', 'Boletus erythropus var. immutatus'),
(417068, 'Boletus', 'erythropus', 'Boletus erythropus var. erythropus'),
(563943, 'Boletus', 'eximius', 'Boletus eximius'),
(264716, 'Boletus', 'fechtneri', 'Boletus fechtneri'),
(372473, 'Boletus', 'ferrugineus', 'Boletus ferrugineus'),
(141943, 'Boletus', 'flavus', 'Boletus flavus'),
(247434, 'Boletus', 'fragrans', 'Boletus fragrans'),
(302971, 'Boletus', 'fuligineus', 'Boletus fuligineus'),
(218213, 'Boletus', 'impolitus', 'Boletus impolitus'),
(327048, 'Boletus', 'legaliae', 'Boletus legaliae'),
(327051, 'Boletus', 'leptospermi', 'Boletus leptospermi'),
(235486, 'Boletus', 'lignatilis', 'Boletus lignatilis'),
(354822, 'Boletus', 'luridiformis', 'Boletus luridiformis var. junquilleus'),
(354845, 'Boletus', 'luridiformis', 'Boletus luridiformis var. discolor'),
(430254, 'Boletus', 'luridiformis', 'Boletus luridiformis var. luridiformis'),
(132915, 'Boletus', 'luridus', 'Boletus luridus var. rubriceps'),
(417113, 'Boletus', 'luridus', 'Boletus luridus var. luridus'),
(241417, 'Boletus', 'megalosporus', 'Boletus megalosporus'),
(282394, 'Boletus', 'moravicus', 'Boletus moravicus'),
(196024, 'Boletus', 'paluster', 'Boletus paluster')
;
我的建议,忘记这个查询,创建一个 fulltext index。
创建涵盖属、种和全名列的索引(全部在一个索引中)。然后这样查询:
SELECT * FROM your_table WHERE MATCH(genus, species, fullname) AGAINST ('Boletus a');
您还可以在查询的其他部分使用MATCH(genus, species, fullname) AGAINST ('Boletus a')
:
SELECT MATCH(genus, species, fullname) AGAINST ('Boletus a') #displays relevancy (a value between 0 and 1)
FROM your_table
WHERE
MATCH(genus, species, fullname) AGAINST ('Boletus a') #filters (obviously)
ORDER BY MATCH(genus, species, fullname) AGAINST ('Boletus a') #also obvious, orders by relevancy
;
此查询 returns 行在搜索物种名称时按相关性排序。我将它用于自动完成建议列表并且相关性计算工作正常,但查询在大型 table 上有点慢,我很感谢有关如何优化它的任何提示(MySQL ).我的主要问题是:
编辑:我使用 InnoDB 作为 table 类型,所以不幸的是我不能在这种情况下使用全文索引(仅适用于 MyIsam tables)。
SQL-fiddle 这里:http://sqlfiddle.com/#!2/f03c4c/5
SELECT 查询:
SET @search ='Boletus a'; SELECT id, 属, 种, 全名, (CASE WHEN (CONCAT(genus, ' ', species)=@search) THEN 1 ELSE 0 END) # 全名完全匹配 + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # 匹配全名的开头 + (CASE WHEN (CONCAT(genus, ' ', species) LIKE CONCAT('%',@search,'%')) THEN 1 ELSE 0 END) # LIKE MATCH OF WHOLE NAME + (CASE WHEN (genus=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF 属 + (CASE WHEN (species=@search) THEN 1 ELSE 0 END) #EXACT MATCH OF species + (CASE WHEN (genus LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) # MATCH BEGINNING OF genus + (CASE WHEN (species LIKE CONCAT(@search,'%')) THEN 1 ELSE 0 END) #MATCH BEGINNING OF species 作为相关 来自物种 WHERE `fullname` LIKE CONCAT('%',@search,'%') ORDER BY relevans DESC,属,种 限制 50;
背景:一个种名至少由属和加词两部分组成(在我的table中,加词栏被命名为“种”)。我在 table 中有三列:属、种和全名。 “全名”列还可以包含较低类群的名称(如 sqlfiddle 示例中的变种和形式)。我愿意就如何提高搜索效率提出任何建议。也许是搜索字符串上的正则表达式并仅针对“全名”列而不是连接两列?
数据库模式示例:
创建 TABLE 个物种 (`id` int, `genus` varchar(50), `species` varchar(50), `fullname` varchar(100)) ; 插入物种 (`id`, `genus`, `species`, `fullname`) 价值观 (360052, 'Afroboletus', 'azureotinctus', 'Afroboletus azureotinctus'), (360053, 'Afroboletus', 'costatisporus', 'Afroboletus costatisporus'), (464267, 'Afroboletus', 'elegans', 'Afroboletus elegans'), (360054, 'Afroboletus', 'lepidellus', 'Afroboletus lepidellus'), (112100, 'Afroboletus', 'luteolus', 'Afroboletus luteolus'), (464266, 'Afroboletus', 'multijugus', 'Afroboletus multijugus'), (112101, 'Afroboletus', 'pterosporus', 'Afroboletus pterosporus'), (326826, 'Aureoboletus', 'auriporus', 'Aureoboletus auriporus'), (326828, 'Aureoboletus', 'gentilis', 'Aureoboletus gentilis'), (309389, 'Aureoboletus', 'novoguineensis', 'Aureoboletus novoguineensis'), (326829, 'Aureoboletus', 'subacidus', 'Aureoboletus subacidus'), (113146, 'Aureoboletus', 'thibetanus', 'Aureoboletus thibetanus'), (118425, 'Austroboletus', 'cookei', 'Austroboletus cookei'), (118427, 'Austroboletus', 'dictyotus', 'Austroboletus dictyotus'), (412550, 'Austroboletus', 'lacunosus', 'Austroboletus lacunosus'), (159051, 'Boletus', 'aereus', 'Boletus aereus'), (171640, 'Boletus', 'appendiculatus', 'Boletus appendiculatus'), (161237, 'Boletus', 'armeniacus', 'Boletus armeniacus'), (563944, 'Boletus', 'australiensis', 'Boletus australiensis'), (444094, 'Boletus', 'badius', 'Boletus badius'), (215376, 'Boletus', 'brunneus', 'Boletus brunneus'), (129701, 'Boletus', 'bubalinus', 'Boletus bubalinus'), (203954, 'Boletus', 'byssinus', 'Boletus byssinus'), (162779, 'Boletus', 'calopus', 'Boletus calopus'), (129469, 'Boletus', 'caucasicus', 'Boletus caucasicus'), (208740, 'Boletus', 'chrysenteron', 'Boletus chrysenteron'), (486540, 'Boletus', 'cisalpinus', 'Boletus cisalpinus'), (368037, 'Boletus', 'declivitatum', 'Boletus declivitatum'), (104061, 'Boletus', 'depilatus', 'Boletus depilatus'), (356530, 'Boletus', 'edulis', 'Boletus edulis'), (356278, 'Boletus', 'erythropus', 'Boletus erythropus var. immutatus'), (417068, 'Boletus', 'erythropus', 'Boletus erythropus var. erythropus'), (563943, 'Boletus', 'eximius', 'Boletus eximius'), (264716, 'Boletus', 'fechtneri', 'Boletus fechtneri'), (372473, 'Boletus', 'ferrugineus', 'Boletus ferrugineus'), (141943, 'Boletus', 'flavus', 'Boletus flavus'), (247434, 'Boletus', 'fragrans', 'Boletus fragrans'), (302971, 'Boletus', 'fuligineus', 'Boletus fuligineus'), (218213, 'Boletus', 'impolitus', 'Boletus impolitus'), (327048, 'Boletus', 'legaliae', 'Boletus legaliae'), (327051, 'Boletus', 'leptospermi', 'Boletus leptospermi'), (235486, 'Boletus', 'lignatilis', 'Boletus lignatilis'), (354822, 'Boletus', 'luridiformis', 'Boletus luridiformis var. junquilleus'), (354845, 'Boletus', 'luridiformis', 'Boletus luridiformis var. discolor'), (430254, 'Boletus', 'luridiformis', 'Boletus luridiformis var. luridiformis'), (132915, 'Boletus', 'luridus', 'Boletus luridus var. rubriceps'), (417113, 'Boletus', 'luridus', 'Boletus luridus var. luridus'), (241417, 'Boletus', 'megalosporus', 'Boletus megalosporus'), (282394, 'Boletus', 'moravicus', 'Boletus moravicus'), (196024, 'Boletus', 'paluster', 'Boletus paluster') ;
我的建议,忘记这个查询,创建一个 fulltext index。
创建涵盖属、种和全名列的索引(全部在一个索引中)。然后这样查询:
SELECT * FROM your_table WHERE MATCH(genus, species, fullname) AGAINST ('Boletus a');
您还可以在查询的其他部分使用MATCH(genus, species, fullname) AGAINST ('Boletus a')
:
SELECT MATCH(genus, species, fullname) AGAINST ('Boletus a') #displays relevancy (a value between 0 and 1)
FROM your_table
WHERE
MATCH(genus, species, fullname) AGAINST ('Boletus a') #filters (obviously)
ORDER BY MATCH(genus, species, fullname) AGAINST ('Boletus a') #also obvious, orders by relevancy
;