使用 to_tsvector 和 to_tsquery 过滤非罗马字符
using to_tsvector and to_tsquery to filter non roman characters
我想允许我的应用程序支持多语言搜索。
Postgresql 9.6 Search Controls 说我需要 tsvector
和 tsquery
才能正确 parse/normalize 文本。这适用于 roman-based 种语言,但不适用于 non-roman 个字符。
正在考虑这个搜索片段
where to_tsvector(title) @@ to_tsquery('hola')
我正在寻找带有 "hola mi amiga" 的标题,并且找到了。但是,鉴于:
where to_tsvector(title) @@ to_tsquery('你') //language = Chinese, Code = zh-CN
我正在寻找 你好嗎
的标题,找不到。
要允许字符串规范化处理非罗马字符,我应该考虑哪些因素?
确保配置正确
default_text_search_config (string)
Selects the text search configuration that is used by those variants of the text search functions that do not have an explicit argument specifying the configuration. See Chapter 12 for further information. The built-in default is pg_catalog.simple, but initdb will initialize the configuration file with a setting that corresponds to the chosen lc_ctype locale, if a configuration matching that locale can be identified.
您可以通过
查看当前值
SHOW default_text_search_config;
or SELECT get_current_ts_config();
您可以使用 SET default_text_search_config = newconfiguration;
为会话更改它,或者,您可以使用 ALTER DATABASE <db> SET default_text_search_config = newconfiguration
From Chapter 12. Full Text Search
During installation an appropriate configuration is selected and default_text_search_config is set accordingly in postgresql.conf. If you are using the same text search configuration for the entire cluster you can use the value in postgresql.conf. To use different configurations throughout the cluster but the same configuration within any one database, use ALTER DATABASE ... SET. Otherwise, you can set default_text_search_config in each session.
Each text search function that depends on a configuration has an optional regconfig argument, so that the configuration to use can be specified explicitly. default_text_search_config is used only when this argument is omitted.
您可以使用\dF
查看您安装的文本搜索配置。
所以你想要的是这样的
where to_tsvector('newconfig', title) @@ to_tsquery('newconfig', '你')
不知道查询使用什么语言来回答这个问题,或者什么配置可以正确地阻止该语言。
我想允许我的应用程序支持多语言搜索。
Postgresql 9.6 Search Controls 说我需要 tsvector
和 tsquery
才能正确 parse/normalize 文本。这适用于 roman-based 种语言,但不适用于 non-roman 个字符。
正在考虑这个搜索片段
where to_tsvector(title) @@ to_tsquery('hola')
我正在寻找带有 "hola mi amiga" 的标题,并且找到了。但是,鉴于:
where to_tsvector(title) @@ to_tsquery('你') //language = Chinese, Code = zh-CN
我正在寻找 你好嗎
的标题,找不到。
要允许字符串规范化处理非罗马字符,我应该考虑哪些因素?
确保配置正确
default_text_search_config (string) Selects the text search configuration that is used by those variants of the text search functions that do not have an explicit argument specifying the configuration. See Chapter 12 for further information. The built-in default is pg_catalog.simple, but initdb will initialize the configuration file with a setting that corresponds to the chosen lc_ctype locale, if a configuration matching that locale can be identified.
您可以通过
查看当前值SHOW default_text_search_config;
or SELECT get_current_ts_config();
您可以使用 SET default_text_search_config = newconfiguration;
为会话更改它,或者,您可以使用 ALTER DATABASE <db> SET default_text_search_config = newconfiguration
From Chapter 12. Full Text Search
During installation an appropriate configuration is selected and default_text_search_config is set accordingly in postgresql.conf. If you are using the same text search configuration for the entire cluster you can use the value in postgresql.conf. To use different configurations throughout the cluster but the same configuration within any one database, use ALTER DATABASE ... SET. Otherwise, you can set default_text_search_config in each session.
Each text search function that depends on a configuration has an optional regconfig argument, so that the configuration to use can be specified explicitly. default_text_search_config is used only when this argument is omitted.
您可以使用\dF
查看您安装的文本搜索配置。
所以你想要的是这样的
where to_tsvector('newconfig', title) @@ to_tsquery('newconfig', '你')
不知道查询使用什么语言来回答这个问题,或者什么配置可以正确地阻止该语言。