ElasticSearch Suggester 全文搜索

ElasticSearch Suggester full-text-search

我正在使用 django_elasticsearch_dsl。

我的文档:

html_strip = analyzer(
    'html_strip',
    tokenizer='standard',
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.KeywordField(),
            'suggest': fields.CompletionField(),
        }
    )
    ...

我的要求:

_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()

我已将以下文档“名称”编入索引:

"This is a test"
"this is my test"
"this test"
"Test this"

现在如果搜索This is my text如果只会收到

"this is my text"

但是,如果我搜索 test,那么我得到的只是

"Test this"

尽管我想要所有名称中包含 test 的文档。

我错过了什么?

The best way to the completion suggester that can match the middle of fields is n-gram filter.

您可以使用多个建议,其中一个建议基于前缀,对于字段中间的匹配,您可以使用正则表达式。

我不知道 django_elasticsearch_dsl,添加了一个包含索引映射、数据、搜索查询和搜索结果的工作示例

索引映射:

{
  "mappings": {
    "properties": {
      "name": {
        "type": "completion"
      }
    }
  }
}

索引数据:

{
  "name": {
    "input": ["Test this"]
  }
}
{
  "name": {
    "input": ["this is my test"]
  }
}
{
  "name": {
    "input": ["This is a test"]
  }
}
{
  "name": {
    "input": ["this test"]
  }
}

搜索查询:

    {
        "suggest": {
            "suggest-exact": {
                "prefix": "test",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            },
            "suggest-regex": {
                "regex": ".*test.*",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            }
        }
    }

搜索结果:

"suggest": {
    "suggest-exact": [
      {
        "text": "test",
        "offset": 0,
        "length": 4,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          }
        ]
      }
    ],
    "suggest-regex": [
      {
        "text": ".*test.*",
        "offset": 0,
        "length": 8,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          },
          {
            "text": "This is a test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "This is a test"
                ]
              }
            }
          },
          {
            "text": "this is my test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this is my test"
                ]
              }
            }
          },
          {
            "text": "this test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this test"
                ]
              }
            }
          }
        ]
      }

Based on the comment given by the user, adding another answer using ngrams

添加具有索引映射、索引数据、搜索查询和搜索结果的工作示例

索引映射:

{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 4,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    },
    "max_ngram_diff": 50
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}

索引数据:

{
  "name": [
    "Test this"
  ]
}

{
  "name": [
    "This is a test"
  ]
}

{
  "name": [
    "this is my test"
  ]
}

{
  "name": [
    "this test"
  ]
}

分析API:

POST/_analyze

{
  "analyzer" : "ngram_analyzer",
  "text" : "this is my test"
}

生成了以下令牌:

{
  "tokens": [
    {
      "token": "this",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "test",
      "start_offset": 11,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

搜索查询:

{
    "query": {
        "match": {
           "name": "test"
        }
    }
}

搜索结果:

"hits": [
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "Test this"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this is my test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "This is a test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this test"
          ]
        }
      }
    ]

对于模糊搜索,您可以使用以下搜索查询:

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "tst"    <-- used tst in place of test
      }
    }
  }
}