elasticsearch php 不是 return 没有 space 的搜索结果

elasticsearch php not return search result without space

我在 elasticsearch 索引 products_idx1 中添加了 15k 条记录并输入 product.

在记录产品名称如 apple iphone 6 所以当我搜索 iphone6 它 returns 空数据。

这是我在 php elasticsearch

中的代码
<?php

    use Elasticsearch\ClientBuilder;

    require 'vendor/autoload.php';

   $client = ClientBuilder::create()->build();
 $values =['name','name.prefix','name.suffix','sku'];
$params =
[
'client'=>['verify'=>1,'connect_timeout'=>5],
'from'=> 0,
'size'=>25,
 'body'  =>[
'query' => [
 'bool'=>
            [
            'should'=> [[
                'multi_match'=> ['query'=>'iphone6','type'=>'cross_fields','fields'=>$values,'operator'=>'OR']
                ],
                ['match'=>['all'=>['query'=>'iphone6','operator'=>'OR','fuzziness'=>'AUTO'] ]]
                ]
            ]

],
'sort'=>['_score'=>['order'=>'desc']],
],

'index'=>'products_idx1'
];

 $response = $client->search($params);
echo "<pre>";print_r($response);

使用 shingle and pattern_replace token filter 可以获得问题和评论中提到的所有 3 个搜索词的结果,即 iphoneiphone6appleiphone 及以下是它的完整示例。

如评论中所述,您从搜索词生成的搜索时间标记应与从索引文档生成的索引时间标记相匹配,以便获得搜索结果,这就是我所取得的成就通过创建自定义分析器。

索引映射

{
  "settings": {
    "analysis": {
      "analyzer": {
        "text_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "shingle",
            "lowercase",
            "space_filter"
          ]
        }
      },
      "filter": {
        "space_filter": {
          "type": "pattern_replace",
          "pattern": " ",
          "replacement": "",
          "preserve_original": true
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "text_analyzer"
      }
    }
  }
}

索引您的示例文档

{
  "title" : "apple iphone 6" 
}

appleiphone 的搜索查询结果

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "appleiphone"
          }
        }
      ]
    }
  }
}

结果

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

iphone6 的搜索查询结果为

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone6"
          }
        }
      ]
    }
  }
}

结果

 "hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

最后但并非最不重要的搜索查询 iphone

{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "iphone"
          }
        }
      ]
    }
  }
}

结果

"hits": [
      {
        "_index": "ana",
        "_type": "_doc",
        "_id": "1",
        "_score": 0.3439677,
        "_source": {
          "title": "apple iphone 6",
          "title_normal": "apple iphone 6"
        }
      }
    ]

因为我的回答已经很大了,补充一下analyze API in another answer for readability reasons and for folks who are not very familiar with analyzers in Elasticsearch and how it works的信息。

在我之前的回答评论中,@Niraj 提到其他文档正在工作,但他遇到了 iphone6 查询的问题,因此为了调试问题 anlyze API 很有用。

首先检查您认为应该与您的搜索查询匹配的文档的索引时间标记,在这种情况下,apple iphone 6

PUT http://{{hostname}}:{{port}}/{{index}}/_analyze

{
"text" : "apple iphone 6",
"analyzer" : "text_analyzer"
}

并生成令牌

{
"tokens": [
{
"token": "apple",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "appleiphone",
"start_offset": 0,
"end_offset": 12,
"type": "shingle",
"position": 0,
"positionLength": 2
},
{
"token": "iphone",
"start_offset": 6,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "iphone6", //note this carefully
"start_offset": 6,
"end_offset": 14,
"type": "shingle",
"position": 1,
"positionLength": 2
},
{
"token": "6",
"start_offset": 13,
"end_offset": 14,
"type": "<NUM>",
"position": 2
}
]
}

现在你可以看到我们使用的分析器创建 iphone6 也作为标记,现在检查搜索时间标记

{
  "text" : "iphone6",
  "analyzer" : "text_analyzer"
}

和代币

{
    "tokens": [
        {
            "token": "iphone6",
            "start_offset": 0,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 0
        }
    ]
}

现在您可以注意到搜索令牌还创建了 iphone6 作为索引时间令牌中也存在的令牌,因此这就是它匹配我已经显示的搜索查询的原因在第一个答案中给出的完整示例中