LIKE 和正则表达式之间的 PostgreSQL 性能差异

Question

有人能解释一下这些 SQL 之间如此大的性能差异吗？

SELECT count(*) as cnt FROM table WHERE name ~ '\*{3}'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name ~ '\*\*\*'; -- Total runtime 12.000 - 18.000 ms
SELECT count(*) as cnt FROM table WHERE name LIKE '%***%'; -- Total runtime 5.000 - 7.000 ms

如您所见，LIKE 运算符和简单正则表达式之间的差异不止一倍（我认为 LIKE 运算符在内部会被转换为正则表达式，应该没有任何区别）

那里有将近 13000 行，"name" 列是 "text" 类型。没有与 table 中定义的 "name" 列相关的索引。

编辑：

对它们中的每一个进行解释分析：

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*{3}';

Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=18.279..18.280 rows=1 loops=1)
  ->  Seq Scan on datos (cost=0.00..894.31 rows=1 width=0) (actual time=0.620..18.266 rows=25 loops=1)
        Filter: (nombre ~ '\*{3}'::text)
Total runtime: 18.327 ms

EXPLAIN ANALYZE SELECT count(*) as cnt FROM datos WHERE nombre ~ '\*\*\*';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=17.404..17.405 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.608..17.396 rows=25 loops=1)
        Filter: (nombre ~ '\*\*\*'::text)
Total runtime: 17.451 ms

EXPLAIN ANALYZE SELECT count(*) as cnt  FROM datos WHERE nombre LIKE '%***%';
Aggregate  (cost=894.32..894.33 rows=1 width=0) (actual time=4.258..4.258 rows=1 loops=1)
  ->  Seq Scan on datos  (cost=0.00..894.31 rows=1 width=0) (actual time=0.138..4.249 rows=25 loops=1)
        Filter: (nombre ~~ '%***%'::text)
Total runtime: 4.295 ms

Answer 1

text LIKE text运算符（~~）由like_match.c中的特定C代码实现。它是完全独立于正则表达式的临时代码。看评论，明显是专门优化了只实现%和_作为通配符，尽可能短路到出口，而正则表达式引擎要复杂几个数量级。

请注意，在您的测试用例中，正则表达式与 LIKE 相比不是最优的，LIKE 与 strpos(name, '***') > 0

相比可能不是最优的

strpos 是通过 Boyer–Moore–Horspool algorithm 实现的，它针对搜索文本中几乎没有部分匹配的大子字符串进行了优化。

这些函数在内部进行了合理优化，但是当有多种方法实现同一目标时，选择可能最好的方法仍然是调用者的工作。 PostgreSQL 不会根据该分析为我们分析要匹配的模式并将 regexp 转换为 LIKE 或将 LIKE 转换为 strpos。

LIKE 和正则表达式之间的 PostgreSQL 性能差异

PostgreSQL performance difference between LIKE and regex

regex

sql

postgresql

performance

sql-like

编辑：