elasticsearch 匹配 phrase_prefix 等

Question

嗨，我有一个关于 elasticsearch 的问题，我有一些像

这样的结果

modul'ion

test lithium file

当我执行查询时，如果我输入 'mod' 我没有找到结果，我将类型："phrase_prefix" 添加到我的查询中，现在我找到了结果

modul'ion

但是现在当我输入 lithium a 时找不到结果

test lithium file

我的要求

    $query ['match'] ['_all'] ["query"] = strtolower ( $keyword );
    $query ['match'] ['_all'] ["type"] = "phrase_prefix";
    $query ['match'] ['_all'] ["analyzer"] = "synonym";

我还使用了一个包含 "lithium =>Rechargeable Lithium" 的同义词分析器我的问题是如果 a 不使用分析器或者我删除

$query ['match'] ['_all'] ["type"] = "phrase_prefix";

我找到了结果，但是 'mod' 的问题又回来了
所以我想在这两种情况下都得到结果你能帮我吗？

我用这个查询设置分析器

 {"analysis" : {
    "analyzer" : {
        "synonym" : {
            "tokenizer" : "whitespace",
            "filter" : ["synonym"]
        }
    },
"filter" : {
            "synonym" : {
                "type" : "synonym",
                "synonyms_path" : "synonym.txt",
                "ignore_case" : true
            }
        }
    }
}

Answer 1

问题不在于查询类型，而在于同义词。同义词过滤器通常用于将一个词替换为另一个词，而不是将一个词替换为整个短语，因为该短语之后不会被标记化。

你必须知道分析用了两次：一次是在索引时，另一次是在搜索时。假设您的文档使用标准分析器（默认分析器）进行分析：

输入 "modul'ion" → 1 个索引词："modul'ion"
输入 "test lithium file" → 3 个索引词："test"、"lithium"、"file"

如果您同时使用标准分析（无同义词）搜索 phrase_prefix:

输入 "mod" → 在 #1
输入 "lithium" → 在 #2
输入 "test lithium" → 在 #2

如果您使用自定义分析器（同义词）进行搜索

输入 "mod" → 在 #1
输入 "lithium" → 1 个搜索词前缀 "Rechargeable Lithium" 未找到
输入 "test lithium" → 2 搜索字词前缀 "test","Rechargeable Lithium" 未找到

如果你以小写索引（索引时的分析链包含小写过滤器），你也应该小心大小写，不要尝试以大写搜索（搜索时的分析链产生"Lithium" 而不是 "lithium").

如果您是 Elasticsearch 的新手，我建议您：

从索引和搜索的相同分析设置开始。你已经知道如何配置一个Analyzer，你只需要使用Put Mapping API来配置indexing
使用 Analyze API

例如：

PUT the_index/_mapping/the_type 
{
  "properties": {
    "the_field": {
      "type": "string",
      "analyze": "the_analyzer"
    }
  }
}

GET the_index/_analyze?analyzer=synonym&text=modul'ion
GET the_index/_analyze?analyzer=synonym&text=test lithium

Answer 2

首先，我没有发现您的映射有任何问题，它们在后端工作得很好。您的问题是您正在查询 _all 字段，需要单独配置。如果你不指定它，它将有默认参数，可以看到here。为了改变这一点，我使用了这些设置和映射：

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_analyzer": {
          "tokenizer": "whitespace",
          "char_filter": ["my_mapping"],
          "filter": [
            "lowercase",
            "my_synonym"
          ]
        }
      },
      "filter": {
        "my_synonym": {
          "type": "synonym",
          "ignore_case": true,
          "synonyms": [
            "rechargeable lithium => lithium"
          ]
        }
      },
      "char_filter": {
        "my_mapping": {
          "type": "mapping",
          "mappings": [
            "'=>"
          ]
        }
      }
    }
  },
  "mappings": {
    "test": {
      "_all": {
        "enabled": true,
        "analyzer": "my_analyzer"
      }
    }
  }
}

这些设置将在空格处断开您的标记，从标记中删除引号并将其小写，以便：

modul'ion 将被索引为 modulion 并且只要用户键入任何这些短语 - 他就会找到它。
rechargeable lithium 被 lithium 替换为同义词。
由于 lowercase 过滤器，所以您的搜索不区分大小写。

使用这些映射，我已将您的数据添加到索引中：

PUT /test/test/1
{
  "text": "modul'ion"
}

PUT /test/test/2
{
  "text": "test lithium file"
}

现在运行这个查询：

POST /test/test/_search
{
  "query": {
    "match": {
      "_all": {
        "query": "rechargeable lithium",
        "type": "phrase_prefix"
      }
    }
  }
}

Returns 我这个文件：

{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.15342641,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "2",
        "_score": 0.15342641,
        "_source": {
          "text": "test lithium file"
        }
      }
    ]
  }
}

以下两个查询：

POST /test/test/_search
{
  "query": {
    "match": {
      "_all": {
        "query": "mod",
        "type": "phrase_prefix"
      }
    }
  }
}

POST /test/test/_search
{
  "query": {
    "match": {
      "_all": {
        "query": "modulion",
        "type": "phrase_prefix"
      }
    }
  }
}

Returns这个：

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.30685282,
    "hits": [
      {
        "_index": "test",
        "_type": "test",
        "_id": "1",
        "_score": 0.30685282,
        "_source": {
          "text": "modul'ion"
        }
      }
    ]
  }
}

这只是原始 JSON 查询，但我想您可以在 PHP 中很好地处理这些问题。

elasticsearch 匹配 phrase_prefix 等

elasticsearch match phrase_prefix and others

php

elasticsearch

elastica