Sqlite FTS5 标点符号在 select 查询中不起作用

Sqlite FTS5 punctuation marks not working in select query

我正在使用 sqlite 进行全文搜索,以下是我正在使用的一些 select 查询示例。

例如:

  1. SELECT * FROM table WHERE table MATCH 'column:father's' ORDER BY rank;

  2. SELECT * FROM table WHERE table MATCH 'column:example:' ORDER BY rank;

  3. SELECT * FROM table WHERE table MATCH 'column:month&' ORDER BY rank;

因为我在搜索文本中使用 ' : & 个字符,所以这些查询给我错误。我也尝试在标点符号前使用转义字符(\-反斜杠)。

有什么解决方案可以用 MATCH 运算符在 fts5 中搜索标点符号(, . / " ' - & 等)?

这些字符在 _, €, £, ¥ 与匹配运算符

谢谢

这似乎是 this question 的副本。尝试那里的最佳答案,其中指出您应该将搜索字符串括在单引号和双引号中。

# fathers'
SELECT * FROM table WHERE table MATCH 'column:"father''s"';

# example:
SELECT * FROM table WHERE table MATCH 'column:"example:"';

# month&
SELECT * FROM table WHERE table MATCH 'column:"month&"';

我想看一个完整的例子,因为我发现使用 fts5 很容易得到微妙和意想不到的结果。

首先,虽然换行搜索字符串可能会给你正确的答案,但它可能不是你真正想要的,这里有一个例子来说明:

$ sqlite3 ":memory:"
sqlite> CREATE VIRTUAL TABLE IF NOT EXISTS bad USING fts5(term, tokenize="unicode61");
sqlite>
sqlite> INSERT INTO bad (term) VALUES ('father''s');
sqlite>
sqlite> SELECT * from bad WHERE term MATCH 'father';
father's
sqlite> SELECT * from bad WHERE term MATCH '"father''s"';
father's
sqlite> SELECT * from bad WHERE term MATCH 's';
father's

请注意 s 如何匹配 father's 也?那是因为当你 运行 father's 通过标记器时,它将根据 the following rules by default:

进行标记化

An FTS5 bareword is a string of one or more consecutive characters that are all either:

  • Non-ASCII range characters (i.e. unicode codepoints greater than 127), or
  • One of the 52 upper and lower case ASCII characters, or
  • One of the 10 decimal digit ASCII characters, or
  • The underscore character (unicode codepoint 96).
  • The substitute character (unicode codepoint 26).

所以 father's 会被标记化为 fathers,这可能是也可能不是你想要的,但为了这个答案,我将假设那不是你想要的。

那你怎么告诉 tokenizerfather's 在一起呢?通过使用 tokenize 参数的 tokenchars 选项:

tokenchars This option is used to specify additional unicode characters that should be considered token characters, even if they are white-space or punctuation characters according to Unicode 6.1. All characters in the string that this option is set to are considered token characters.

让我们看另一个例子,这次使用 tokenchars:

$ sqlite3 ":memory:"
sqlite> CREATE VIRTUAL TABLE IF NOT EXISTS good USING fts5(term, tokenize="unicode61  tokenchars '''&:'");
sqlite>
sqlite> INSERT INTO good (term) VALUES ('father''s');
sqlite> INSERT INTO good (term) VALUES ('month&');
sqlite> INSERT INTO good (term) VALUES ('example:');
sqlite>
sqlite> SELECT count(*) from good WHERE term MATCH 'father';
0
sqlite> SELECT count(*) from good WHERE term MATCH '"father''s"';
1
sqlite> SELECT count(*) from good WHERE term MATCH 'example';
0
sqlite> SELECT count(*) from good WHERE term MATCH '"example:"';
1
sqlite> SELECT count(*) from good WHERE term MATCH 'month';
0
sqlite> SELECT count(*) from good WHERE term MATCH '"month&"';
1

这些结果似乎更令人期待。但是第一个例子的随机 s 结果呢?

sqlite> SELECT count(*) from good WHERE term MATCH 's';
0

太棒了!

希望这可以帮助您按照预期的方式设置 table。