查找所有带有希伯来名字的记录

Find all records with Hebrew names

我有一个带有用户 table 的 postgresql 数据库,其中每个用户都有一个名称(在 unicode 中)。我想查找名称中至​​少包含一个希伯来字符的所有用户。我想过使用 regex,例如

select * from users
where name ~ '[א-ת]';

有没有比正则表达式更有效的方法?我在名称列上有一个 B 树索引。

更新

将不同的索引与 pg_trgm 模块用作 by @FuzzyTree

      B-tree GIST  GIN
user  0.04   0.04  0.03
sys   0.02   0.04  0.01
total 0.06   0.08  0.04

关于磁盘大小,GIN索引是GIST的0.2倍,B树的0.8倍。所以,我们在这里有一个赢家,至少对于我的用例而言。 YMMV(例如,我没有对索引创建和更新进行基准测试)。版本:postgres 9.6.

一个选项是创建一个布尔列,即 is_hebrew_name,您可以使用正则表达式更新一次并在其上创建常规索引。

如果您不想添加其他列并且您是 运行 v9.3 或更高版本,请考虑使用 pg_trgm 模块创建 GINGIST name

上的索引
CREATE EXTENSION pg_trgm;
CREATE INDEX trgm_idx ON users USING GIST (name gist_trgm_ops);

The pg_trgm module provides GiST and GIN index operator classes that allow you to create an index over a text column for the purpose of very fast similarity searches. These index types support the above-described similarity operators, and additionally support trigram-based index searches for LIKE, ILIKE, ~ and ~* queries.

The index search works by extracting trigrams from the regular expression and then looking these up in the index. The more trigrams that can be extracted from the regular expression, the more effective the index search is. Unlike B-tree based searches, the search string need not be left-anchored.

For both LIKE and regular-expression searches, keep in mind that a pattern with no extractable trigrams will degenerate to a full-index scan.

The choice between GiST and GIN indexing depends on the relative performance characteristics of GiST and GIN, which are discussed elsewhere.

有关详细信息,请参阅 https://www.postgresql.org/docs/9.6/static/pgtrgm.html