MongoDB 文本索引搜索

Question

我在 mongo 数据库中创建了一个集合，如下所示

db.articles.insert([
 { _id: 1, subject: "one", author: "abc", views: 50 },
 { _id: 2, subject: "lastone", author: "abc", views: 5 },
 { _id: 3, subject: "firstone", author: "abc", views: 90  },
 { _id: 4, subject: "everyone", author: "abc", views: 100 },
 { _id: 5, subject: "allone", author: "efg", views: 100 },
 { _id: 6, subject: "noone", author: "efg", views: 100 },
 { _id: 7, subject: "nothing", author: "abc", views: 100 }])

之后，我为字段主题和作者提供了文本索引。

db.articles.createIndex(
    {subject: "text",
    author: "text"})

现在我正在尝试在索引字段中搜索带有 "one" 的词。当我执行查询时...

db.articles.count({$text: {$search: "\"one\""}})

...结果是1.

问题是当我想要单词组合时 "one"、"abc" ...

db.articles.count({$text: {$search: "\"one\" \"abc\""}}

...它给出的结果为 4。包括包含主题名称为 "lastone"、"firstone"、"everyone"、"one" 的记录作为结果。

所以我的问题是为什么第一个查询没有获取 4 条记录？以及如何编写一个查询来获取 4 条包含单词 "one" 的记录？

Answer 1

这个命令...

db.articles.count({$text: {$search: "\"one\""}})

... 将计算具有确切短语 "one" 的文档。只有一个这样的文档，因此结果是 1.

使用值 "one" 查询应该只在文档上 return，因为只有一个文档包含 "one" 或 "one" 是词干的某个值。来自 the docs:

For case insensitive and diacritic insensitive text searches, the $text operator matches on the complete stemmed word. So if a document field contains the word blueberry, a search on the term blue will not match. However, blueberry or blueberries will match.

正在查看您问题中的文档...

one 不是 everyone
one 不是 lastone
one 不是 allone
one 不是 firstone
one 不是 noone

...所以这些文档中的 none 个将与值 one 匹配。

当然，您可以使用多个值进行查询。例如：

docs 建议将其计算为 one or abc 并且正确 returns 5:
```
db.articles.count({$text: {$search: "one abc"}})
```
docs 建议将其计算为 "abc" AND ("abc" or "one") 并且正确 returns 5:
```
db.articles.count({$text: {$search: "\"abc\" one"}})
```
docs 建议这应该被评估为 "one" AND ("one" or "abc") 但不知何故 returns 4:
```
db.articles.count({$text: {$search: "\"one\" abc"}})
```

在最后一个示例中，MongoDB 包含主题为 "one"、"lastone"、"firstone"、"everyone" 的文档，但排除主题为 "nothing"。这表明它以某种方式将 "one" 视为 "lastone"、"firstone" 和 "everyone" 的词干，但在执行 count({$text: {$search: "one"}}) 时它 returns 1 这清楚地表明 one 不被视为 "lastone"、"firstone" 和 "everyone".

的词干

我怀疑这可能是一个错误，可能值得 raising with MongoDB。

FWIW，您真正想要的可能是部分字符串搜索，在这种情况下 $regex 可能会起作用。以下查询 ...

db.articles.count({ subject: { $regex: /one$/ }, author: { $regex: /abc$/ } })

... 表示类似 count where subject like '%one%' and author like '%abc%' 的意思，对于 returns 4 的文档，即 subject 是 "one" 之一的文档， "lastone"、"firstone"、"allone"、"everyone"、"noone" 和 author 是 "abc"。

MongoDB 文本索引搜索

MongoDB text index search

database

database-administration

mongodb

mongo-shell

mongodb-query