sphinx 搜索:使用隐式查询且不返回结果
sphinx search: querying using implicit AND not returning the result
使用 sphinx 搜索 v2.2.9,此查询 return 是一条特定记录:
((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )
此查询还 return 相同的记录:
(@(issued) 2007)
但是这个查询(我认为是上述两个查询的隐含 "AND" 组合不 return 记录:
(@(issued) 2007) ((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )
为什么?
更新 1:
我可以使用 mysql 命令行重现此问题,下面显示了 运行 以上 3 个测试中的每一个。请注意,id: 187 在两个单独的结果集中,但不在组合结果集中。
$ mysql -h0 -P9306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SELECT id, weight() FROM `work` WHERE MATCH('(@(issued) 2007)') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
+------+----------+
| id | weight() |
+------+----------+
| 187 | 1604 |
| 200 | 1604 |
| 215 | 1604 |
..i cutoff these results as irrelevant.
+------+----------+
40 rows in set, 1 warning (0.01 sec)
mysql> SELECT id, weight() FROM `work` WHERE MATCH('((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
+------+----------+
| id | weight() |
+------+----------+
| 187 | 1560 |
| 383 | 1560 |
+------+----------+
2 rows in set, 1 warning (0.01 sec)
mysql> SELECT id, weight() FROM `work` WHERE MATCH('(@(issued) 2007) ((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
Empty set, 1 warning (0.01 sec)
mysql>
更新二:
我还应该提到,"work" 索引是包含 "iresrevi1_core" 和 "iresrevi2_core" 的多个索引之间的分布式索引。 "issued"字段在iresrevi2_core索引中为null(在iresrevi1_core索引中不为null)并且正在搜索的authori字段在对面索引iresrevi1_core中为null(在iresrevi1_core中不为null iresrevi2_core 索引)。
我想这可能与此有关?我可以确认,如果我直接查询 2 个索引,iresrevi1_core 索引将 return 发布的搜索数据,但 iresrevi2_core 索引不会 return 发布的搜索数据。反之亦然,iresrevi2_core 将 return 作者搜索数据,但 iresrevi1_core 不会。
我有多个分布式索引,这样我就可以使用 "non infix" 方法索引我想搜索的所有字段(强制 "infix" 字段在此索引中为空)和所有我想使用 "infix" 方法搜索的字段位于另一个索引上,所有 "non-infix" 字段都被清空了。 2 sources/indexes 看起来像这样:
source srcresrevi1 : srcresrev
{
sql_query = \
select SQL_NO_CACHE `work`.`ID` AS `ID`, '' as authori \
from work \
WHERE (`work`.`ID` BETWEEN $start AND $end) \
and `work`.`ID` <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = authsurname from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id AS ID, s.surname \
from `work_authors` wa, \
`author_surnames` s \
where wa.author_surname_id = s.id \
and wa.work_id >= $start and wa.work_id <= $end \
and `wa`.`work_ID` <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = author from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id AS ID, CONCAT(f.given,' ',s.surname) \
from `work_authors` wa, `author_surnames` s, `author_fnames` f \
where wa.author_surname_id = s.id \
and wa.author_fname_id = f.id \
and wa.work_id >= $start and wa.work_id <= $end \
and `wa`.`work_ID` <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = issued from ranged-query; \
SELECT SQL_NO_CACHE work_id AS ID, `year` \
from issued \
where work_id >= $start and work_id <= $end \
and work_ID <= (select max_id from sphinx_deltas where id = 1) \
order by work_id ASC; \
select min(work_id), max(work_id) from `issued` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
}
source srcresrevi2 : srcresrev
{
sql_query = \
select SQL_NO_CACHE `work`.`ID` AS `ID`, '' as authsurname, '' as author, '' as issued \
from work \
WHERE (`work`.`ID` BETWEEN $start AND $end) \
and `work`.`ID` <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = authori from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id, CONCAT(f.given,' ',s.surname) \
from `work_authors` wa, `author_surnames` s, `author_fnames` f \
where wa.author_surname_id = s.id \
and wa.author_fname_id = f.id \
and work_id >= $start and work_id <= $end \
and work_ID <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
}
index iresrevi1_core
{
source = srcresrevi1
path = /home/resrev/pubrevit/db/sphinx/development/iresrevi1
docinfo = extern
dict = keywords
mlock = 0
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F
min_word_len = 3
expand_keywords = 0
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
html_strip = 1
html_remove_elements = style, script, head, DOCTYPE, !DOCTYPE
inplace_enable = 1
index_exact_words = 0
index_sp = 0
index_field_lengths = 1
}
index iresrevi2_core
{
source = srcresrevi2
path = /home/resrev/pubrevit/db/sphinx/development/iresrevi2
docinfo = extern
dict = keywords
mlock = 0
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F
min_word_len = 3
min_infix_len = 3
expand_keywords = 1
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
html_strip = 1
html_remove_elements = style, script, head, DOCTYPE, !DOCTYPE
inplace_enable = 1
index_exact_words = 0
index_sp = 1
index_field_lengths = 1
}
所以是的,问题在于多个不同的索引。联合不加入。
除了您找到的线程外,这里还有一个更新的线程,其中提到可能使用@@relaxed 来解决它。它可能仍然适用于分布式索引。
使用 sphinx 搜索 v2.2.9,此查询 return 是一条特定记录:
((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )
此查询还 return 相同的记录:
(@(issued) 2007)
但是这个查询(我认为是上述两个查询的隐含 "AND" 组合不 return 记录:
(@(issued) 2007) ((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )
为什么?
更新 1:
我可以使用 mysql 命令行重现此问题,下面显示了 运行 以上 3 个测试中的每一个。请注意,id: 187 在两个单独的结果集中,但不在组合结果集中。
$ mysql -h0 -P9306
Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 2.2.9-id64-release (rel22-r5006)
Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
mysql> SELECT id, weight() FROM `work` WHERE MATCH('(@(issued) 2007)') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
+------+----------+
| id | weight() |
+------+----------+
| 187 | 1604 |
| 200 | 1604 |
| 215 | 1604 |
..i cutoff these results as irrelevant.
+------+----------+
40 rows in set, 1 warning (0.01 sec)
mysql> SELECT id, weight() FROM `work` WHERE MATCH('((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
+------+----------+
| id | weight() |
+------+----------+
| 187 | 1560 |
| 383 | 1560 |
+------+----------+
2 rows in set, 1 warning (0.01 sec)
mysql> SELECT id, weight() FROM `work` WHERE MATCH('(@(issued) 2007) ((@(author) fstaed) | (@(authsurname) fstaed) | (@(authori) fstaed) )') AND `sphinx_deleted` = 0 LIMIT 0, 1000 OPTION ranker=proximity_bm25
-> ;
Empty set, 1 warning (0.01 sec)
mysql>
更新二:
我还应该提到,"work" 索引是包含 "iresrevi1_core" 和 "iresrevi2_core" 的多个索引之间的分布式索引。 "issued"字段在iresrevi2_core索引中为null(在iresrevi1_core索引中不为null)并且正在搜索的authori字段在对面索引iresrevi1_core中为null(在iresrevi1_core中不为null iresrevi2_core 索引)。
我想这可能与此有关?我可以确认,如果我直接查询 2 个索引,iresrevi1_core 索引将 return 发布的搜索数据,但 iresrevi2_core 索引不会 return 发布的搜索数据。反之亦然,iresrevi2_core 将 return 作者搜索数据,但 iresrevi1_core 不会。
我有多个分布式索引,这样我就可以使用 "non infix" 方法索引我想搜索的所有字段(强制 "infix" 字段在此索引中为空)和所有我想使用 "infix" 方法搜索的字段位于另一个索引上,所有 "non-infix" 字段都被清空了。 2 sources/indexes 看起来像这样:
source srcresrevi1 : srcresrev
{
sql_query = \
select SQL_NO_CACHE `work`.`ID` AS `ID`, '' as authori \
from work \
WHERE (`work`.`ID` BETWEEN $start AND $end) \
and `work`.`ID` <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = authsurname from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id AS ID, s.surname \
from `work_authors` wa, \
`author_surnames` s \
where wa.author_surname_id = s.id \
and wa.work_id >= $start and wa.work_id <= $end \
and `wa`.`work_ID` <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = author from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id AS ID, CONCAT(f.given,' ',s.surname) \
from `work_authors` wa, `author_surnames` s, `author_fnames` f \
where wa.author_surname_id = s.id \
and wa.author_fname_id = f.id \
and wa.work_id >= $start and wa.work_id <= $end \
and `wa`.`work_ID` <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = issued from ranged-query; \
SELECT SQL_NO_CACHE work_id AS ID, `year` \
from issued \
where work_id >= $start and work_id <= $end \
and work_ID <= (select max_id from sphinx_deltas where id = 1) \
order by work_id ASC; \
select min(work_id), max(work_id) from `issued` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
}
source srcresrevi2 : srcresrev
{
sql_query = \
select SQL_NO_CACHE `work`.`ID` AS `ID`, '' as authsurname, '' as author, '' as issued \
from work \
WHERE (`work`.`ID` BETWEEN $start AND $end) \
and `work`.`ID` <= (select max_id from sphinx_deltas where id = 1)
sql_joined_field = authori from ranged-query; \
SELECT SQL_NO_CACHE wa.work_id, CONCAT(f.given,' ',s.surname) \
from `work_authors` wa, `author_surnames` s, `author_fnames` f \
where wa.author_surname_id = s.id \
and wa.author_fname_id = f.id \
and work_id >= $start and work_id <= $end \
and work_ID <= (select max_id from sphinx_deltas where id = 1) \
order by wa.work_id ASC; \
select min(work_id), max(work_id) from `work_authors` \
where work_id <= (select max_id from sphinx_deltas where id = 1)
}
index iresrevi1_core
{
source = srcresrevi1
path = /home/resrev/pubrevit/db/sphinx/development/iresrevi1
docinfo = extern
dict = keywords
mlock = 0
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F
min_word_len = 3
expand_keywords = 0
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
html_strip = 1
html_remove_elements = style, script, head, DOCTYPE, !DOCTYPE
inplace_enable = 1
index_exact_words = 0
index_sp = 0
index_field_lengths = 1
}
index iresrevi2_core
{
source = srcresrevi2
path = /home/resrev/pubrevit/db/sphinx/development/iresrevi2
docinfo = extern
dict = keywords
mlock = 0
morphology = stem_en
charset_table = 0..9, A..Z->a..z, _, a..z, \
U+410..U+42F->U+430..U+44F, U+430..U+44F
min_word_len = 3
min_infix_len = 3
expand_keywords = 1
ngram_len = 1
ngram_chars = U+3000..U+2FA1F
html_strip = 1
html_remove_elements = style, script, head, DOCTYPE, !DOCTYPE
inplace_enable = 1
index_exact_words = 0
index_sp = 1
index_field_lengths = 1
}
所以是的,问题在于多个不同的索引。联合不加入。
除了您找到的线程外,这里还有一个更新的线程,其中提到可能使用@@relaxed 来解决它。它可能仍然适用于分布式索引。