查找所有带有希伯来名字的记录
Find all records with Hebrew names
我有一个带有用户 table 的 postgresql 数据库,其中每个用户都有一个名称(在 unicode 中)。我想查找名称中至少包含一个希伯来字符的所有用户。我想过使用 regex,例如
select * from users
where name ~ '[א-ת]';
有没有比正则表达式更有效的方法?我在名称列上有一个 B 树索引。
更新
将不同的索引与 pg_trgm
模块用作 by @FuzzyTree
B-tree GIST GIN
user 0.04 0.04 0.03
sys 0.02 0.04 0.01
total 0.06 0.08 0.04
关于磁盘大小,GIN索引是GIST的0.2倍,B树的0.8倍。所以,我们在这里有一个赢家,至少对于我的用例而言。 YMMV(例如,我没有对索引创建和更新进行基准测试)。版本:postgres 9.6.
一个选项是创建一个布尔列,即 is_hebrew_name
,您可以使用正则表达式更新一次并在其上创建常规索引。
如果您不想添加其他列并且您是 运行 v9.3 或更高版本,请考虑使用 pg_trgm
模块创建 GIN
或 GIST
name
上的索引
CREATE EXTENSION pg_trgm;
CREATE INDEX trgm_idx ON users USING GIST (name gist_trgm_ops);
The pg_trgm module provides GiST and GIN index operator classes that
allow you to create an index over a text column for the purpose of
very fast similarity searches. These index types support the
above-described similarity operators, and additionally support
trigram-based index searches for LIKE, ILIKE, ~ and ~* queries.
The index search works by extracting trigrams from the regular
expression and then looking these up in the index. The more trigrams
that can be extracted from the regular expression, the more effective
the index search is. Unlike B-tree based searches, the search string
need not be left-anchored.
For both LIKE and regular-expression searches, keep in mind that a
pattern with no extractable trigrams will degenerate to a full-index
scan.
The choice between GiST and GIN indexing depends on the relative
performance characteristics of GiST and GIN, which are discussed
elsewhere.
有关详细信息,请参阅 https://www.postgresql.org/docs/9.6/static/pgtrgm.html
我有一个带有用户 table 的 postgresql 数据库,其中每个用户都有一个名称(在 unicode 中)。我想查找名称中至少包含一个希伯来字符的所有用户。我想过使用 regex,例如
select * from users
where name ~ '[א-ת]';
有没有比正则表达式更有效的方法?我在名称列上有一个 B 树索引。
更新
将不同的索引与 pg_trgm
模块用作
B-tree GIST GIN
user 0.04 0.04 0.03
sys 0.02 0.04 0.01
total 0.06 0.08 0.04
关于磁盘大小,GIN索引是GIST的0.2倍,B树的0.8倍。所以,我们在这里有一个赢家,至少对于我的用例而言。 YMMV(例如,我没有对索引创建和更新进行基准测试)。版本:postgres 9.6.
一个选项是创建一个布尔列,即 is_hebrew_name
,您可以使用正则表达式更新一次并在其上创建常规索引。
如果您不想添加其他列并且您是 运行 v9.3 或更高版本,请考虑使用 pg_trgm
模块创建 GIN
或 GIST
name
CREATE EXTENSION pg_trgm;
CREATE INDEX trgm_idx ON users USING GIST (name gist_trgm_ops);
The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries.
The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.
For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.
The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.
有关详细信息,请参阅 https://www.postgresql.org/docs/9.6/static/pgtrgm.html