处理德语变音符号的 Sphinx 配置

Sphinx Configuration to handle German Umlauts

我正在使用这个索引配置:

index humans
{
    source                = src_humans
    path                  = /usr/local/sphinx/var/data/humans
    charset_table         = 0..9, A..Z->a..z, _, a..z, U+C4->U+E4, U+D6->U+F6, U+DC->U+FC, U+DF, U+E4, U+F6, U+FC
    html_strip            = 1
    html_index_attrs      = img=src,alt; a=href,title
    morphology            = libstemmer_de
    min_infix_len         = 3
    stopwords             = /tmp/stopwords_de.txt
}

我的索引器贯穿于:

Sphinx 2.3.1-id64-beta (r4926)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)

using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'humans'...
WARNING: index 'humans': dict=keywords and prefixes and morphology enabled, forcing index_exact_words=1
WARNING: Attribute count is 0: switching to none docinfo
collected 2 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 2 docs, 989 bytes
total 0.043 sec, 22888 bytes/sec, 46.28 docs/sec
total 3 reads, 0.000 sec, 2.0 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 1.9 kb/call avg, 0.0 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=8908).

当我使用 $sc->Query('*gef*') 进行搜索时,我找到了一个描述中包含 "Gefährlich" 的文档,但当我使用 $sc->Query('*gefä*') 进行搜索时却找不到。

我做错了什么? 我的整个 MySQL-DB 和属于该项目的每个文件都是 UTF-8 编码的。

提前致谢!

当我使用 Sphinx 时,我的 searchd 配置中有类似

的东西

collation_server = utf8_general_ci

在我的 index 配置中:

charset_type = utf-8

希望对你有帮助

我修复了这个行为 sql_query_pre = SET NAMES utf8.