有没有办法在 mongodb 全文搜索中转义重音

Question

使用新的 Atlas 搜索功能可以避免重音。

我做了这个索引：

{
 "analyzer": "lucene.standard",
 "searchAnalyzer": "lucene.standard",
 "mappings": {
   "dynamic": false,
   "fields": {
     "_id": {
       "type": "string",
       "analyzer": "lucene.keyword"
     },
     "firstName": {
       "type": "string",
       "analyzer": "lucene.french"
     },
     "lastName": {
       "type": "string",
       "analyzer": "lucene.french"
     },
     "email": {
       "type": "string",
       "analyzer": "lucene.standard"
     }
   }
 }
}

有了这个数据：

db.testJTAFulltextSearch.insert({_id: "testFTS3", firstName: "René", lastName: "Martin", email: "rmartin@gmail.com"})
db.testJTAFulltextSearch.insert({_id: "testFTS4", firstName: "Rene", lastName: "Martin", email: "rmartin@gmail.com"})

并通过此搜索：

db.testJTAFulltextSearch.aggregate([{$searchBeta: {index: "customer", text: {query: "René", path: ["_id", "firstName", "email"]}}}])

我得到了：

{ "_id" : "testFTS3", "firstName" : "René", "lastName" : "Martin", "email" : "rmartin@gmail.com" }

不转义重音符号（é 应该像 e 一样处理）。我期待：

{ "_id" : "testFTS3", "firstName" : "René", "lastName" : "Martin", "email" : "rmartin@gmail.com" }
{ "_id" : "testFTS4", "firstName" : "Rene", "lastName" : "Martin", "email" : "rmartin@gmail.com" }

有没有办法使用 Mongodb Atlas Search 来避免重音符号（变音符号）？

我想我需要一个 ascii 折叠分析器，但我没有在分析器列表中找到它： https://docs.atlas.mongodb.com/reference/atlas-search/analyzers/#analyzers-ref

排序规则的使用似乎不起作用：

db.testJTAFulltextSearch.aggregate([{$searchBeta: {index: "customer", text: {query: "René", path: ["_id", "firstName",
 "email"]}}}], {collation: {locale: "en", strength: 1}})

仍然returns只有"René"

Answer 1

您是否尝试过 fuzzy 配置？它似乎没有默认启用，但 fuzzy: { maxEdits: 2 } 应该已经涵盖了。

我最近有一个 similar issue 但发现那实际上是我在那里设置了错误的配置（prefixLength: 1 而不是默认值 0）的错 - 请参阅线程。就我而言，我使用的是 term 运算符而不是 text，但我不确定它的相关性如何。

有没有办法在 mongodb 全文搜索中转义重音

Is there a way to escape accents on mongodb fulltext search

lucene

mongodb

mongodb-atlas

mongodb-atlas-search