ElasticSearch Suggester 全文搜索

Question

我正在使用 django_elasticsearch_dsl。

我的文档：

html_strip = analyzer(
    'html_strip',
    tokenizer='standard',
    filter=["lowercase", "stop", "snowball"],
    char_filter=["html_strip"]
)

class Document(django_elasticsearch_dsl.Document):
    name = TextField(
        analyzer=html_strip,
        fields={
            'raw': fields.KeywordField(),
            'suggest': fields.CompletionField(),
        }
    )
    ...

我的要求：

_search = Document.search().suggest("suggestions", text=query, completion={'field': 'name.suggest'}).execute()

我已将以下文档“名称”编入索引：

"This is a test"
"this is my test"
"this test"
"Test this"

现在如果搜索This is my text如果只会收到

"this is my text"

但是，如果我搜索 test，那么我得到的只是

"Test this"

尽管我想要所有名称中包含 test 的文档。

我错过了什么？

Answer 1

The best way to the completion suggester that can match the middle of fields is n-gram filter.

您可以使用多个建议，其中一个建议基于前缀，对于字段中间的匹配，您可以使用正则表达式。

我不知道 django_elasticsearch_dsl，添加了一个包含索引映射、数据、搜索查询和搜索结果的工作示例

索引映射：

{
  "mappings": {
    "properties": {
      "name": {
        "type": "completion"
      }
    }
  }
}

索引数据：

{
  "name": {
    "input": ["Test this"]
  }
}
{
  "name": {
    "input": ["this is my test"]
  }
}
{
  "name": {
    "input": ["This is a test"]
  }
}
{
  "name": {
    "input": ["this test"]
  }
}

搜索查询：

    {
        "suggest": {
            "suggest-exact": {
                "prefix": "test",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            },
            "suggest-regex": {
                "regex": ".*test.*",
                "completion": {
                    "field": "name",
                    "skip_duplicates": true
                }
            }
        }
    }

搜索结果：

"suggest": {
    "suggest-exact": [
      {
        "text": "test",
        "offset": 0,
        "length": 4,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          }
        ]
      }
    ],
    "suggest-regex": [
      {
        "text": ".*test.*",
        "offset": 0,
        "length": 8,
        "options": [
          {
            "text": "Test this",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "Test this"
                ]
              }
            }
          },
          {
            "text": "This is a test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "1",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "This is a test"
                ]
              }
            }
          },
          {
            "text": "this is my test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "2",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this is my test"
                ]
              }
            }
          },
          {
            "text": "this test",
            "_index": "stof_64281341",
            "_type": "_doc",
            "_id": "3",
            "_score": 1.0,
            "_source": {
              "name": {
                "input": [
                  "this test"
                ]
              }
            }
          }
        ]
      }

Answer 2

Based on the comment given by the user, adding another answer using ngrams

添加具有索引映射、索引数据、搜索查询和搜索结果的工作示例

索引映射：

{
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "ngram",
          "min_gram": 4,
          "max_gram": 20
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    },
    "max_ngram_diff": 50
  },
  "mappings": {
    "properties": {
      "name": {
        "type": "text",
        "analyzer": "ngram_analyzer",
        "search_analyzer": "standard"
      }
    }
  }
}

索引数据：

{
  "name": [
    "Test this"
  ]
}

{
  "name": [
    "This is a test"
  ]
}

{
  "name": [
    "this is my test"
  ]
}

{
  "name": [
    "this test"
  ]
}

分析API:

POST/_analyze

{
  "analyzer" : "ngram_analyzer",
  "text" : "this is my test"
}

生成了以下令牌：

{
  "tokens": [
    {
      "token": "this",
      "start_offset": 0,
      "end_offset": 4,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "test",
      "start_offset": 11,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 3
    }
  ]
}

搜索查询：

{
    "query": {
        "match": {
           "name": "test"
        }
    }
}

搜索结果：

"hits": [
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "4",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "Test this"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "3",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this is my test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "2",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "This is a test"
          ]
        }
      },
      {
        "_index": "stof_64281341",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.2876821,
        "_source": {
          "name": [
            "this test"
          ]
        }
      }
    ]

对于模糊搜索，您可以使用以下搜索查询：

{
  "query": {
    "fuzzy": {
      "name": {
        "value": "tst"    <-- used tst in place of test
      }
    }
  }
}

ElasticSearch Suggester 全文搜索

ElasticSearch Suggester full-text-search

python

django

elasticsearch

elasticsearch-dsl