将邻近搜索强制为多词词形式?

Force Proximity Search into multiple word wordform?

我将 Proximity 与 Sphinx 一起使用,例如Twain NEAR/1 Mark 将 return

马克吐温

马克吐温

但是假设我有这样一个单词形式:

工作日 > 工作日

我如何设置任何给定的搜索以使用 Proximity NEAR/3(或 NEAR/X)以便找到

工作日

星期几

在这种情况下,我知道还有其他方法可以给猫蒙皮,但总的来说,我正在寻找一种方法,使多词映射不会被推送为 'Word1 Word2',即 'Week Day',否则我获取

等文档

'我工作了一整天才意识到需要

整整一周'

没有开箱即用的简单方法。您也许可以在您的应用程序中进行更改,以便它在您的搜索查询中将每个 'word' 更改为 "word"~N,或者甚至更好地只对 Sphinx 处理的相同词形式进行更改。这是一个例子:

mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    1 | Weekday                                                                       |    1 |     2319 |
|    2 | day of week                                                                   |    2 |     1319 |
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id   | doc     | a    | weight() |
+------+---------+------+----------+
|    1 | Weekday |    1 |     2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id   | doc         | a    | weight() |
+------+-------------+------+----------+
|    1 | Weekday     |    1 |     2319 |
|    2 | day of week |    2 |     1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)

mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)

mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id   | doc                                                                           | a    | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
|    3 | I worked for one entire day before realizing it was going to take a full week |    3 |     2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)

mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)

最后一个是最好的方法,但您必须:

1) 解析您的查询。例如。像这样:

mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1    | weekday   | week       |
| 2    | weekday   | day        |
| 3    | full      | full       |
| 4    | week      | week       |
+------+-----------+------------+
4 rows in set (0.00 sec)

并且如果您看到对于同一个标记化词,您会得到 2 个不同的规范化词,这可能是您的应用将标记化词包装到 "word"~N 中的信号。

2) 运行 查询。在这种情况下 "weekday"~2 整周