Mongo $text query: return docs "starting with" string before others

Question

假设我有一个 mongo 集合，在 itemName 字段上有一个 text index，其中包含以下 3 个文档：

{
    _id: ...,
    itemName: 'Mashed carrots with big carrot pieces',
    price: 1.29
},
{
    _id: ...,
    itemName: 'Carrot juice',
    price: 0.79
},
{
    _id: ...,
    itemName: 'Apple juice',
    price: 1.49
}

然后我执行这样的查询：

db.items.find({ $text: { $search: 'Car' } }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

如何强制 mongo 到 return 以 "Car" 开头的文档（不区分大小写）before return 在 itemName 字符串的某处也包含 "Car" 的任何其他文档？

所以我想按以下顺序检索文档：

[
    {..., itemName: 'Carrot Juice', ...},
    {..., itemName: 'Mashed carrots with big carrot pieces', ...}
]

当然，这是要用于搜索功能的，因此向用户显示以搜索字符串开始的项目，然后再显示任何其他项目，这是完全有意义的那。

直到现在我都在使用标准的正则表达式，但这里的性能当然要差得多！ + 由于我必须搜索不区分大小写的内容，根据文档，正常的正则表达式根本不使用任何索引？！

编辑：

还有，有时候$text的行为很奇怪。例如，我有大约 10-15 个项目，其中 itemName 以单词 "Zwiebel" 开头。这个查询

db.items.find({ $text: { $search: "Zwiebel" }, supplier_id: 'iNTJHEf5YgBPicTrJ' }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

非常有效，return 包含所有这些文档，而此查询

db.items.find({ $text: { $search: "Zwie" }, supplier_id: 'iNTJHEf5YgBPicTrJ' }, { score: { $meta: "textScore" } }).sort( { score: { $meta: "textScore" } } );

没有return任何东西！只有在 $search.

中将 "Zwiebel" 更改为 "Zwie"

我真的不明白这怎么可能？！

最佳，P

Answer 1

一个解决方案是使用 MongoDB 3.4

中介绍的 $indexOfCP 运算符

此运算符 return 一个字符串在另一个字符串中出现的索引，如果没有出现则为 -1

工作原理：

使用正则表达式过滤掉所有不包含 'car' 的文档：/car/gi（不区分大小写）
创建一个名为 index 的字段，用于存储 'car' 在 itemName
根据 index 字段对文档进行排序

查询如下所示：

db.items.aggregate([
   {
      $match:{
         itemName:/car/gi
      }
   },
   {
      $project:{
         index:{
            $indexOfCP:[
               {
                  $toLower:"$itemName"
               },
               "car"
            ]
         },
         price:1,
         itemName:1
      }
   },
   {
      $sort:{
         index:1
      }
   }
])

和这个 returns：

{ "_id" : 2, "itemName" : "Carrot juice", "price" : 0.79, "index" : 0 }
{ "_id" : 1, "itemName" : "Mashed carrots with big carrot pieces", "price" : 1.29, "index" : 7 }

在线试用：mongoplayground.net/p/FqqCUQI3D-E

编辑：

对于$text索引的行为，这是完全正常的

文本索引使用定界符标记文本（默认定界符是白色 space 和标点符号）。它只能用于搜索整个世界，因此不适用于单词的子部分

来自 mongodb text index documentation

$text will tokenize the search string using whitespace and most punctuation as delimiters, and perform a logical OR of all such tokens in the search string.

Mongo $text query: return docs "starting with" string before others

Mongo $text query: return docs "starting with" string before others

indexing

mongodb

fulltext-index