正则表达式。 \b 西里尔符号

Question

请告诉我，可以用什么代替 \b 来突出显示西里尔文本中的单词？

我在 SQLite 数据库列中有一个文本“текст”。

正在运行：

select * from myTable where text REGEXP 'текст'

它不工作：

select * from myTable where text REGEXP '\bтекст\b'

Answer 1

原来你的 SQLite REGEXP 实现是基于 PCRE 的。

您可以使用 (*UCP) PCRE 动词使 \b Unicode 可识别：

'(*UCP)\bтекст\b'

在 pcrepattern man page:

有一些关于动词的细节

Another special sequence that may appear at the start of a pattern is (*UCP). This has the same effect as setting the PCRE_UCP option: it causes sequences such as \d and \w to use Unicode properties to determine character types, instead of recognizing only characters with codes less than 128 via a lookup table.

以后：

Note also that PCRE_UCP affects \b, and \B because they are defined in terms of \w and \W. Matching these sequences is noticeably slower when PCRE_UCP is set.

嗯，它会更慢，因为它现在必须处理整个 Unicode table。

正则表达式。 \b 西里尔符号

RegEx. \b for Cyrillic symbols

regex

pcre

cyrillic