处理德语变音符号的 Sphinx 配置
Sphinx Configuration to handle German Umlauts
我正在使用这个索引配置:
index humans
{
source = src_humans
path = /usr/local/sphinx/var/data/humans
charset_table = 0..9, A..Z->a..z, _, a..z, U+C4->U+E4, U+D6->U+F6, U+DC->U+FC, U+DF, U+E4, U+F6, U+FC
html_strip = 1
html_index_attrs = img=src,alt; a=href,title
morphology = libstemmer_de
min_infix_len = 3
stopwords = /tmp/stopwords_de.txt
}
我的索引器贯穿于:
Sphinx 2.3.1-id64-beta (r4926)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'humans'...
WARNING: index 'humans': dict=keywords and prefixes and morphology enabled, forcing index_exact_words=1
WARNING: Attribute count is 0: switching to none docinfo
collected 2 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 2 docs, 989 bytes
total 0.043 sec, 22888 bytes/sec, 46.28 docs/sec
total 3 reads, 0.000 sec, 2.0 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 1.9 kb/call avg, 0.0 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=8908).
当我使用 $sc->Query('*gef*')
进行搜索时,我找到了一个描述中包含 "Gefährlich" 的文档,但当我使用 $sc->Query('*gefä*')
进行搜索时却找不到。
我做错了什么?
我的整个 MySQL-DB 和属于该项目的每个文件都是 UTF-8 编码的。
提前致谢!
当我使用 Sphinx 时,我的 searchd
配置中有类似
的东西
collation_server = utf8_general_ci
在我的 index
配置中:
charset_type = utf-8
希望对你有帮助
我修复了这个行为
sql_query_pre = SET NAMES utf8
.
我正在使用这个索引配置:
index humans
{
source = src_humans
path = /usr/local/sphinx/var/data/humans
charset_table = 0..9, A..Z->a..z, _, a..z, U+C4->U+E4, U+D6->U+F6, U+DC->U+FC, U+DF, U+E4, U+F6, U+FC
html_strip = 1
html_index_attrs = img=src,alt; a=href,title
morphology = libstemmer_de
min_infix_len = 3
stopwords = /tmp/stopwords_de.txt
}
我的索引器贯穿于:
Sphinx 2.3.1-id64-beta (r4926)
Copyright (c) 2001-2015, Andrew Aksyonoff
Copyright (c) 2008-2015, Sphinx Technologies Inc (http://sphinxsearch.com)
using config file '/usr/local/sphinx/etc/sphinx.conf'...
indexing index 'humans'...
WARNING: index 'humans': dict=keywords and prefixes and morphology enabled, forcing index_exact_words=1
WARNING: Attribute count is 0: switching to none docinfo
collected 2 docs, 0.0 MB
sorted 0.0 Mhits, 100.0% done
total 2 docs, 989 bytes
total 0.043 sec, 22888 bytes/sec, 46.28 docs/sec
total 3 reads, 0.000 sec, 2.0 kb/call avg, 0.0 msec/call avg
total 9 writes, 0.000 sec, 1.9 kb/call avg, 0.0 msec/call avg
rotating indices: successfully sent SIGHUP to searchd (pid=8908).
当我使用 $sc->Query('*gef*')
进行搜索时,我找到了一个描述中包含 "Gefährlich" 的文档,但当我使用 $sc->Query('*gefä*')
进行搜索时却找不到。
我做错了什么? 我的整个 MySQL-DB 和属于该项目的每个文件都是 UTF-8 编码的。
提前致谢!
当我使用 Sphinx 时,我的 searchd
配置中有类似
collation_server = utf8_general_ci
在我的 index
配置中:
charset_type = utf-8
希望对你有帮助
我修复了这个行为
sql_query_pre = SET NAMES utf8
.