如何制作一个不区分大小写的部分文本搜索引擎,它使用带 MongoDB 和 PHP 的正则表达式?
How do I make a Case Insensitive, Partial Text Search Engine that uses Regex with MongoDB and PHP?
我正在尝试改进我的应用程序中的搜索栏。如果用户现在在搜索栏中输入“泰坦”,每次我使用以下正则表达式函数时,应用程序都会从 MongoDB 中检索电影“泰坦尼克号”:
require 'dbconnection.php';
$input= $_REQUEST['input'];
$query=$collection->find(['movie' => new MongoDB\BSON\Regex($input)]);
我还可以通过在 Mongo shell 中创建以下索引来使集合不区分大小写,因此如果用户在搜索栏中键入“tiTAnIc”,应用程序将检索电影“泰坦尼克号”来自 MongoDB:
db.createCollection("c1", { collation: { locale: 'en_US', strength: 2 } } )
db.c1.createIndex( { movie: 1 } )
$query=$collection->find( [ 'movie' => $input] );
注意: 索引列上的正则表达式搜索会影响性能,如 $regex docs:
Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.
您的问题是 MongoDB 在 $regex
上使用 prefix(例如:/^acme/
For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.
$query=$collection->find(['movie' => new MongoDB\BSON\Regex('^'.$input, 'i')]);
我建议你更仔细地设计你的 collection。
我正在尝试改进我的应用程序中的搜索栏。如果用户现在在搜索栏中输入“泰坦”,每次我使用以下正则表达式函数时,应用程序都会从 MongoDB 中检索电影“泰坦尼克号”:
require 'dbconnection.php';
$input= $_REQUEST['input'];
$query=$collection->find(['movie' => new MongoDB\BSON\Regex($input)]);
我还可以通过在 Mongo shell 中创建以下索引来使集合不区分大小写,因此如果用户在搜索栏中键入“tiTAnIc”,应用程序将检索电影“泰坦尼克号”来自 MongoDB:
db.createCollection("c1", { collation: { locale: 'en_US', strength: 2 } } )
db.c1.createIndex( { movie: 1 } )
$query=$collection->find( [ 'movie' => $input] );
注意: 索引列上的正则表达式搜索会影响性能,如 $regex docs:
所述Case insensitive regular expression queries generally cannot use indexes effectively. The $regex implementation is not collation-aware and is unable to utilize case-insensitive indexes.
您的问题是 MongoDB 在 $regex
上使用 prefix(例如:/^acme/
For case sensitive regular expression queries, if an index exists for the field, then MongoDB matches the regular expression against the values in the index, which can be faster than a collection scan. Further optimization can occur if the regular expression is a “prefix expression”, which means that all potential matches start with the same string. This allows MongoDB to construct a “range” from that prefix and only match against those values from the index that fall within that range.
$query=$collection->find(['movie' => new MongoDB\BSON\Regex('^'.$input, 'i')]);
我建议你更仔细地设计你的 collection。