将邻近搜索强制为多词词形式?
Force Proximity Search into multiple word wordform?
我将 Proximity 与 Sphinx 一起使用,例如Twain NEAR/1 Mark
将 return
马克吐温
和
马克吐温
但是假设我有这样一个单词形式:
工作日 > 工作日
我如何设置任何给定的搜索以使用 Proximity NEAR/3
(或 NEAR/X
)以便找到
工作日
和
星期几
在这种情况下,我知道还有其他方法可以给猫蒙皮,但总的来说,我正在寻找一种方法,使多词映射不会被推送为 'Word1 Word2'
,即 'Week Day'
,否则我获取
等文档
'我工作了一整天才意识到需要
整整一周'
没有开箱即用的简单方法。您也许可以在您的应用程序中进行更改,以便它在您的搜索查询中将每个 'word' 更改为 "word"~N,或者甚至更好地只对 Sphinx 处理的相同词形式进行更改。这是一个例子:
mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id | doc | a | weight() |
+------+---------+------+----------+
| 1 | Weekday | 1 | 2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id | doc | a | weight() |
+------+-------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)
最后一个是最好的方法,但您必须:
1) 解析您的查询。例如。像这样:
mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | weekday | week |
| 2 | weekday | day |
| 3 | full | full |
| 4 | week | week |
+------+-----------+------------+
4 rows in set (0.00 sec)
并且如果您看到对于同一个标记化词,您会得到 2 个不同的规范化词,这可能是您的应用将标记化词包装到 "word"~N 中的信号。
2) 运行 查询。在这种情况下 "weekday"~2 整周
我将 Proximity 与 Sphinx 一起使用,例如Twain NEAR/1 Mark
将 return
马克吐温
和
马克吐温
但是假设我有这样一个单词形式:
工作日 > 工作日
我如何设置任何给定的搜索以使用 Proximity NEAR/3
(或 NEAR/X
)以便找到
工作日
和
星期几
在这种情况下,我知道还有其他方法可以给猫蒙皮,但总的来说,我正在寻找一种方法,使多词映射不会被推送为 'Word1 Word2'
,即 'Week Day'
,否则我获取
'我工作了一整天才意识到需要
整整一周'
没有开箱即用的简单方法。您也许可以在您的应用程序中进行更改,以便它在您的搜索查询中将每个 'word' 更改为 "word"~N,或者甚至更好地只对 Sphinx 处理的相同词形式进行更改。这是一个例子:
mysql> select *, weight() from idx_min where match('weekday');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1319 |
+------+-------------------------------------------------------------------------------+------+----------+
3 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"');
+------+---------+------+----------+
| id | doc | a | weight() |
+------+---------+------+----------+
| 1 | Weekday | 1 | 2319 |
+------+---------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2');
+------+-------------+------+----------+
| id | doc | a | weight() |
+------+-------------+------+----------+
| 1 | Weekday | 1 | 2319 |
| 2 | day of week | 2 | 1319 |
+------+-------------+------+----------+
2 rows in set (0.00 sec)
mysql> select *, weight() from idx_min where match('"entire"~2 "day"~2');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 1500 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.00 sec)
mysql> select *, weight() from idx_min where match('weekday full week');
+------+-------------------------------------------------------------------------------+------+----------+
| id | doc | a | weight() |
+------+-------------------------------------------------------------------------------+------+----------+
| 3 | I worked for one entire day before realizing it was going to take a full week | 3 | 2439 |
+------+-------------------------------------------------------------------------------+------+----------+
1 row in set (0.01 sec)
mysql> select *, weight() from idx_min where match('"weekday"~2 full week');
Empty set (0.00 sec)
最后一个是最好的方法,但您必须:
1) 解析您的查询。例如。像这样:
mysql> call keywords('weekday full week', 'idx_min');
+------+-----------+------------+
| qpos | tokenized | normalized |
+------+-----------+------------+
| 1 | weekday | week |
| 2 | weekday | day |
| 3 | full | full |
| 4 | week | week |
+------+-----------+------------+
4 rows in set (0.00 sec)
并且如果您看到对于同一个标记化词,您会得到 2 个不同的规范化词,这可能是您的应用将标记化词包装到 "word"~N 中的信号。
2) 运行 查询。在这种情况下 "weekday"~2 整周