elasticsearch 查询也匹配其中有破折号的术语

elasticsearch query also matches terms that have dashes in it

我有一个类似于下面的查询

{
    "size": 15,
    "from": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "match_phrase": {
                                "category": "men_fashion"
                            }
                        },
                        {
                            "match_phrase": {
                                "category": "western_clothing"
                            }
                        },
                        {
                            "match_phrase": {
                                "category": "shirts"
                            }
                        }
                    ]
                }
            }
        }
    }

这里的问题是它还会获取该类别中的产品 "t-shirts"。我怎样才能限制它只找到完全匹配的?

更新:这是我用于映射的代码

{
    "mappings": {
        "products": {
            "properties": {
                "variations": {
                    "type": "nested"
                }
            }
        }
    }
}

这是一个实际的样品产品

{
    "title": "100% Cotton Unstitched Suit For Men",
    "slug": "100-cotton-unstitched-suit-for-men",
    "price": 200,
    "sale_price": 0,
    "vendor_id": 32,
    "featured": 0,
    "viewed": 20,
    "stock": 4,
    "sku": "XXX-B",
    "rating": 0,
    "active": 1,
    "vendor_name": "vendor_name",
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        },
        {
            "variation_id": "35",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-D",
            "size": "l",
            "color": "red"
        }
    ]
}

您没有提供有关映射的任何信息,因此我假设您已将标准分析器应用于 category 字段。查看您的查询(过滤器语法)我还假设您使用的 ES 版本低于 5.0。

使用标准分析器,在索引 t-shirt 个文档时为 category 字段创建以下术语:

http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
{
    "tokens": [
        {
            "token": "t",
            "start_offset": 0,
            "end_offset": 1,
            "type": "<ALPHANUM>",
            "position": 0
        },
        {
            "token": "shirt",
            "start_offset": 2,
            "end_offset": 7,
            "type": "<ALPHANUM>",
            "position": 1
        }
    ]
}

所以现在当您搜索 shirts 时,您还会得到 t-shirts 个文档。

如果您的用例中的category字段无法分析(您不需要全文搜索),那么只需将category字段标记为not_analyzed即可。

{
    "mappings": {
        "data": {
            "properties": {
                "category": {
                    "type":     "string",
                    "index":    "not_analyzed"
                }
            }
        }
    }
}

如果您需要保留分析 category 内容的能力,那么您可以使用 Whitespace analyzer(破折号不会被视为单词分隔符):

{
    "mappings": {
        "data": {
            "properties": {
                "category": {
                    "type": "string",
                    "analyzer": "whitespace"
                }
            }
        }
    }
}

另一种解决方案是使用 Keyword analyzer,但它类似于 not_analyzed 选项。

这完全取决于您的需要,但所有解决方案都需要更改索引的映射。您可以通过以下方式检查分析器的行为:

http://127.0.0.1:9200/_analyze?analyzer=whitespace&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=keyword&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt

附加信息

基本上您是在 category 字段上搜索,因此 variations 是嵌套的这一事实在这里并不重要。 category 类型为 string 的字段可以保存一个值数组,这在这里也不是问题。

有了这个映射(注"analyzer": "whitespace"):

PUT http://localhost:9200/test
{
    "mappings": {
        "products": {
            "properties": {
                "variations": {
                    "type": "nested",
                    "properties": {
                        "size":    { "type": "string" },
                        "color":   { "type": "string" },
                        ... // other nested fields
                    }
                },
                "category":    { 
                    "type": "string",
                    "analyzer": "whitespace"
                },
                ... // other fields
            }
        }
    }
}

我索引了两个文档

文档 1:

{
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric",
        "shirts"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        }
    ]
}

文档 2:

{
    "category": [
        "men_fashion",
        "traditional_clothing",
        "unstitched_fabric",
        "t-shirts"
    ],
    "image": "imagename.jpg",
    "variations": [
        {
            "variation_id": "34",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-C",
            "size": "m",
            "color": "red"
        },
        {
            "variation_id": "35",
            "stock": 5,
            "price": 200,
            "variation_image": "",
            "sku": "XXX-D",
            "size": "l",
            "color": "red"
        }
    ]
}

现在当我搜索时:

{
    "size": 15,
    "from": 0,
    "query": {
        "filtered": {
            "filter": {
                "bool": {
                    "must": [
                        {
                            "match_phrase": {
                                "category": "men_fashion"
                            }
                        },
                        {
                            "match_phrase": {
                                "category": "shirts"
                            }
                        }
                    ]
                }
            }
        }
    }
}

我只得到文档 1。

如果需要,您也可以以类似的方式将 "analyzer": "whitespace" 添加到嵌套的 variations.color 等字段(但搜索查询也必须更改为搜索嵌套文档)。