我应该如何调整 Sphinx？

Question

我制作了我的第一个 FullTextSearch 应用程序。今天终于开始考试了。

'test'

... the Whig national **ticket** victorious, he ... Democrats who **thought** he was ...
... also a self-**tought** architect. He ... and he always **thought** of how ...
... HTML (Hyper **Text** Mark-up ... Сервер HTML Hyper **Text** Markup Language. ...
... ,0 2. Aiced **test** ratio (Quick ratio ... 
... on the first **Tuesday**, after the first ...

'stop'

... ],Elm); OutText(Elm); **Stop**:=False; End; '2 ... 
.. a crucial **step** in the ... is an increasingly **steep** maturity-related... 
... CHIPSET FEATURES **SETUP** или INTEGRATED ... CHIPSET FEATURES **SETUP** или ... 
... Trisetum, Anisantna, **Stipa** и ... многие виды **Stipa**, Stipagrostis), что ...

我的配置：

source src1
{
type = csvpipe
csvpipe_command = /usr/bin/php /var/www/html/import.php 
csvpipe_field_string = title
csvpipe_field_string = content
csvpipe_attr_string  = path
}

 index test1
{   source          = src1
path            = /var/lib/sphinxsearch/data/test1
mlock           = 0
# morphology        = stem_en, stem_ru, soundex
min_word_len    = 2
html_strip      = 0
}

我评论了形态字符串并重新加载 Sphinx，但结果相同。看来形态学对我还是有用的。

Answer 1

可能最重要的是

morphology = stem_en, stem_ru, soundex

形态学是一个非常强大的功能，因为它 'morphs' 词进入索引（并且在查询中，所以可以匹配！），使用各种规则。

在您的情况下，您启用了词干提取，其中 'normalizes' 词尾，但也有 soundex，这是一种 'sounds similar' 算法。我相信是为英语设计的，所以不知道它在俄语上的表现如何！

所有这些意味着将获得 'similar' 匹配，而不仅仅是精确的单词匹配。

(test 只是一个与其他词发音相似的词，您的 hyper 示例是一个发音更独特的词)

也意识到它可能只是一个测试脚本，但可以一次将多个文档传递给buildExcerpts。所以它应该更有效率，编译文档并调用 buildExcepts 一次。

但更有趣的是，当您从 sphinx 属性获取文本时，可以在主查询中使用 SNIPPETS() sphinx 函数（在 setSelect()! 中）。所以你不必收到全文，只需发回给 sphinx。即狮身人面像将在内部从属性中获取文本。更高效！

我应该如何调整 Sphinx？

How I should tune Sphinx?

sphinx