elasticsearch 查询也匹配其中有破折号的术语
elasticsearch query also matches terms that have dashes in it
我有一个类似于下面的查询
{
"size": 15,
"from": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"category": "men_fashion"
}
},
{
"match_phrase": {
"category": "western_clothing"
}
},
{
"match_phrase": {
"category": "shirts"
}
}
]
}
}
}
}
这里的问题是它还会获取该类别中的产品
"t-shirts"。我怎样才能限制它只找到完全匹配的?
更新:这是我用于映射的代码
{
"mappings": {
"products": {
"properties": {
"variations": {
"type": "nested"
}
}
}
}
}
这是一个实际的样品产品
{
"title": "100% Cotton Unstitched Suit For Men",
"slug": "100-cotton-unstitched-suit-for-men",
"price": 200,
"sale_price": 0,
"vendor_id": 32,
"featured": 0,
"viewed": 20,
"stock": 4,
"sku": "XXX-B",
"rating": 0,
"active": 1,
"vendor_name": "vendor_name",
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
},
{
"variation_id": "35",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-D",
"size": "l",
"color": "red"
}
]
}
您没有提供有关映射的任何信息,因此我假设您已将标准分析器应用于 category
字段。查看您的查询(过滤器语法)我还假设您使用的 ES 版本低于 5.0。
使用标准分析器,在索引 t-shirt
个文档时为 category
字段创建以下术语:
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
{
"tokens": [
{
"token": "t",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "shirt",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
所以现在当您搜索 shirts
时,您还会得到 t-shirts
个文档。
如果您的用例中的category
字段无法分析(您不需要全文搜索),那么只需将category
字段标记为not_analyzed
即可。
{
"mappings": {
"data": {
"properties": {
"category": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
如果您需要保留分析 category
内容的能力,那么您可以使用 Whitespace analyzer(破折号不会被视为单词分隔符):
{
"mappings": {
"data": {
"properties": {
"category": {
"type": "string",
"analyzer": "whitespace"
}
}
}
}
}
另一种解决方案是使用 Keyword analyzer,但它类似于 not_analyzed
选项。
这完全取决于您的需要,但所有解决方案都需要更改索引的映射。您可以通过以下方式检查分析器的行为:
http://127.0.0.1:9200/_analyze?analyzer=whitespace&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=keyword&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
附加信息
基本上您是在 category
字段上搜索,因此 variations
是嵌套的这一事实在这里并不重要。 category
类型为 string
的字段可以保存一个值数组,这在这里也不是问题。
有了这个映射(注"analyzer": "whitespace"
):
PUT http://localhost:9200/test
{
"mappings": {
"products": {
"properties": {
"variations": {
"type": "nested",
"properties": {
"size": { "type": "string" },
"color": { "type": "string" },
... // other nested fields
}
},
"category": {
"type": "string",
"analyzer": "whitespace"
},
... // other fields
}
}
}
}
我索引了两个文档
文档 1:
{
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric",
"shirts"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
}
]
}
文档 2:
{
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric",
"t-shirts"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
},
{
"variation_id": "35",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-D",
"size": "l",
"color": "red"
}
]
}
现在当我搜索时:
{
"size": 15,
"from": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"category": "men_fashion"
}
},
{
"match_phrase": {
"category": "shirts"
}
}
]
}
}
}
}
}
我只得到文档 1。
如果需要,您也可以以类似的方式将 "analyzer": "whitespace"
添加到嵌套的 variations.color
等字段(但搜索查询也必须更改为搜索嵌套文档)。
我有一个类似于下面的查询
{
"size": 15,
"from": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"category": "men_fashion"
}
},
{
"match_phrase": {
"category": "western_clothing"
}
},
{
"match_phrase": {
"category": "shirts"
}
}
]
}
}
}
}
这里的问题是它还会获取该类别中的产品 "t-shirts"。我怎样才能限制它只找到完全匹配的?
更新:这是我用于映射的代码
{
"mappings": {
"products": {
"properties": {
"variations": {
"type": "nested"
}
}
}
}
}
这是一个实际的样品产品
{
"title": "100% Cotton Unstitched Suit For Men",
"slug": "100-cotton-unstitched-suit-for-men",
"price": 200,
"sale_price": 0,
"vendor_id": 32,
"featured": 0,
"viewed": 20,
"stock": 4,
"sku": "XXX-B",
"rating": 0,
"active": 1,
"vendor_name": "vendor_name",
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
},
{
"variation_id": "35",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-D",
"size": "l",
"color": "red"
}
]
}
您没有提供有关映射的任何信息,因此我假设您已将标准分析器应用于 category
字段。查看您的查询(过滤器语法)我还假设您使用的 ES 版本低于 5.0。
使用标准分析器,在索引 t-shirt
个文档时为 category
字段创建以下术语:
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
{
"tokens": [
{
"token": "t",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "shirt",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
}
]
}
所以现在当您搜索 shirts
时,您还会得到 t-shirts
个文档。
如果您的用例中的category
字段无法分析(您不需要全文搜索),那么只需将category
字段标记为not_analyzed
即可。
{
"mappings": {
"data": {
"properties": {
"category": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
如果您需要保留分析 category
内容的能力,那么您可以使用 Whitespace analyzer(破折号不会被视为单词分隔符):
{
"mappings": {
"data": {
"properties": {
"category": {
"type": "string",
"analyzer": "whitespace"
}
}
}
}
}
另一种解决方案是使用 Keyword analyzer,但它类似于 not_analyzed
选项。
这完全取决于您的需要,但所有解决方案都需要更改索引的映射。您可以通过以下方式检查分析器的行为:
http://127.0.0.1:9200/_analyze?analyzer=whitespace&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=keyword&text=t-shirt
http://127.0.0.1:9200/_analyze?analyzer=standard&text=t-shirt
附加信息
基本上您是在 category
字段上搜索,因此 variations
是嵌套的这一事实在这里并不重要。 category
类型为 string
的字段可以保存一个值数组,这在这里也不是问题。
有了这个映射(注"analyzer": "whitespace"
):
PUT http://localhost:9200/test
{
"mappings": {
"products": {
"properties": {
"variations": {
"type": "nested",
"properties": {
"size": { "type": "string" },
"color": { "type": "string" },
... // other nested fields
}
},
"category": {
"type": "string",
"analyzer": "whitespace"
},
... // other fields
}
}
}
}
我索引了两个文档
文档 1:
{
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric",
"shirts"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
}
]
}
文档 2:
{
"category": [
"men_fashion",
"traditional_clothing",
"unstitched_fabric",
"t-shirts"
],
"image": "imagename.jpg",
"variations": [
{
"variation_id": "34",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-C",
"size": "m",
"color": "red"
},
{
"variation_id": "35",
"stock": 5,
"price": 200,
"variation_image": "",
"sku": "XXX-D",
"size": "l",
"color": "red"
}
]
}
现在当我搜索时:
{
"size": 15,
"from": 0,
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"match_phrase": {
"category": "men_fashion"
}
},
{
"match_phrase": {
"category": "shirts"
}
}
]
}
}
}
}
}
我只得到文档 1。
如果需要,您也可以以类似的方式将 "analyzer": "whitespace"
添加到嵌套的 variations.color
等字段(但搜索查询也必须更改为搜索嵌套文档)。