使用 Sphinx 搜索多个全文字段
Searching on multiple fulltext fields with Sphinx
我正在尝试使用 sphinxsearch 在多个字段上进行搜索,本质上是为了绕过对搜索过滤属性中使用的数字 ID 的限制(数据库使用大量字母数字 uniqID 作为 ID)。
这是 Sphinx 配置中使用的主要搜索:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
text_page.text 是主要的全文字段。
我已将此行添加到配置中以尝试将此行全文索引:
sql_field_string = white_label_id
然后我尝试通过 PHP Sphinx class.
通过 运行 创建一个由 white_label_id 缩小的查询
"@text (search words) @white_label_id (some-uniq-id)"
据我了解from here,这应该意味着@text 和@white_label_id 都必须在数据库行上产生命中才能return 结果。
然而,查询从未产生任何结果,也没有错误或警告。
关于这里出了什么问题有什么建议吗?是因为 white_label_id
和 text
字段在不同的表上吗?是否有避免重构数据库以使用数字 ID 的解决方案?
已编辑:
根据要求,这是一个完整的配置文件。
请注意,目前代码仍在使用 PHP Sphinx Class,而不是通过 mysqli 的 SphinxQL。
source src2
{
sql_host = localhost
sql_user = username
sql_pass = password
sql_db = databasename
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
sql_attr_uint = startdate
sql_attr_uint = enddate
sql_attr_uint = volume
sql_attr_timestamp = date_created
sql_attr_string = long_title
sql_attr_string = name
#sql_attr_string = white_label_id #NB - does not work with nonnumeric ids
sql_attr_string = document_id
sql_attr_string = series_id
sql_field_string = white_label_id #currently appears to do nothing
sql_ranged_throttle = 0
}
source src2throttled : src2
{
sql_ranged_throttle = 100
}
index myindex11
{
source = src2
path = /var/data/mydata1
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
index myindex1stemmed : myindex1
{
path = /var/data/mydata1stemmed
morphology = stem_en
index_exact_words = 1
}
最终证明有一个更好的解决方案来解决 Sphinx 列 ID 上的 'numeric only' 规则。
答案是创建基于文本的 uniq_id 列的数字散列,然后可以用作 sql_attr_uint
来缩小搜索范围。
例如原post中的SQL查询变为:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, CRC32(series.white_label_id) AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON (text_page.document_id = documents.id AND documents.is_active = 1) \
INNER JOIN series ON documents.series_id = series.id
我正在尝试使用 sphinxsearch 在多个字段上进行搜索,本质上是为了绕过对搜索过滤属性中使用的数字 ID 的限制(数据库使用大量字母数字 uniqID 作为 ID)。
这是 Sphinx 配置中使用的主要搜索:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
text_page.text 是主要的全文字段。
我已将此行添加到配置中以尝试将此行全文索引:
sql_field_string = white_label_id
然后我尝试通过 PHP Sphinx class.
通过 运行 创建一个由 white_label_id 缩小的查询"@text (search words) @white_label_id (some-uniq-id)"
据我了解from here,这应该意味着@text 和@white_label_id 都必须在数据库行上产生命中才能return 结果。
然而,查询从未产生任何结果,也没有错误或警告。
关于这里出了什么问题有什么建议吗?是因为 white_label_id
和 text
字段在不同的表上吗?是否有避免重构数据库以使用数字 ID 的解决方案?
已编辑:
根据要求,这是一个完整的配置文件。 请注意,目前代码仍在使用 PHP Sphinx Class,而不是通过 mysqli 的 SphinxQL。
source src2
{
sql_host = localhost
sql_user = username
sql_pass = password
sql_db = databasename
sql_port = 3306 # optional, default is 3306
sql_query_pre = SET NAMES utf8
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, series.white_label_id AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON text_page.document_id = documents.id \
INNER JOIN series ON documents.series_id = series.id
sql_attr_uint = startdate
sql_attr_uint = enddate
sql_attr_uint = volume
sql_attr_timestamp = date_created
sql_attr_string = long_title
sql_attr_string = name
#sql_attr_string = white_label_id #NB - does not work with nonnumeric ids
sql_attr_string = document_id
sql_attr_string = series_id
sql_field_string = white_label_id #currently appears to do nothing
sql_ranged_throttle = 0
}
source src2throttled : src2
{
sql_ranged_throttle = 100
}
index myindex11
{
source = src2
path = /var/data/mydata1
docinfo = extern
mlock = 0
morphology = none
min_word_len = 1
charset_type = utf-8
html_strip = 0
}
index myindex1stemmed : myindex1
{
path = /var/data/mydata1stemmed
morphology = stem_en
index_exact_words = 1
}
最终证明有一个更好的解决方案来解决 Sphinx 列 ID 上的 'numeric only' 规则。
答案是创建基于文本的 uniq_id 列的数字散列,然后可以用作 sql_attr_uint
来缩小搜索范围。
例如原post中的SQL查询变为:
sql_query = \
SELECT text_page.id, text_page.document_id, documents.startdate, documents.enddate, documents.long_title, documents.volume,text_page.images_page_id, text_page.text, \
series.name, series.id AS series_id, CRC32(series.white_label_id) AS white_label_id, \
documents.date_created\
FROM text_page \
INNER JOIN documents ON (text_page.document_id = documents.id AND documents.is_active = 1) \
INNER JOIN series ON documents.series_id = series.id