对数组对象的弹性搜索匹配查询
elastic search match query over array object
假设我有 3 个文档
doc_1 = {
"citedIn": [
"Bar Councils Act, 1926 - Section 15",
"Contract Act, 1872 - Section 23"
]
}
doc_2 = {
"citedIn":[
"15 C. B 400",
"Contract Act, 1872 - Section 55"
]
}
doc_3 = {
"citedIn":[
"15 C. B 400",
"Contract Act, 1872 - Section 15"
]
}
这里citedIn
字段是一个数组object.Now我要运行一个standermatch
查询
{
"query":
{
"match": {"citedIn":{"query": "Contract act 15" , "operator":"and" }}
}
}
上面的查询 return 所有的 3 文档,但它假设 return doc_3
因为只有 doc_3
包含 Contract
, act
和 15
放在一个数组元素中。
我该如何实现?
任何 suggestion/Solution 将是可取的
嵌套数据类型更新:
我试过嵌套字段。
这是我的映射
{
"mappings": {
"properties": {
"citedIn": {
"type": "nested",
"include_in_parent": true,
"properties": {
"someFiled": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
这是我的数据
doc_1 = {
"citedIn": [
{"someFiled" : "Bar Councils Act, 1926 - Section 15"},
{"someFiled" : "Contract Act, 1872 - Section 23"}
]
}
doc_2 = {
"citedIn":[
{"someFiled" : "15 C. B 400"}
{"someFiled" : "Contract Act, 1872 - Section 55"}
]
}
doc_3 = {
"citedIn":[
{"someFiled" : "15 C. B 400"},
{"someFiled" : "Contract Act, 1872 - Section 15"}
]
}
这是我的查询
{
"query":
{
"match": {"citedIn.someFiled":{"query": "Contract act 15" , "operator":"and" }}
}
}
但仍然得到相同的结果
你无法实现这一点,因为你正在索引的是 citedIn
字段中的字符串数组,并且所有 Elasticsearch 字段在设计时默认为 multi-valued在 Lucene 中,elasticsearch 建立在 Lucene search library.
之上
请阅读 arrays in elasticsearch 了解更多信息,尤其是下图所示的最后一条重要说明:
如上图所示,数组中的所有字符串实际上属于同一字段,因此 ES 无法识别您的搜索字符串是否属于数组中的同一字符串,因此您在搜索中获得了所有文档。
除非您将这些字符串作为其他字段(例如 nested
字段)的一部分进行索引,但为此您需要提供字段名称,它就像一个映射,其中键是您的字段名称,值是字段值而不是查询字段名称,您将无法实现 use-case.
添加包含索引数据、映射、搜索查询和搜索结果的工作示例。
您需要使用 nested query 来搜索嵌套字段
索引映射
{
"mappings": {
"properties": {
"citedIn": {
"type": "nested"
}
}
}
}
索引数据:
{
"citedIn": [
{
"someFiled": "Bar Councils Act, 1926 - Section 15"
},
{
"someFiled": "Contract Act, 1872 - Section 23"
}
]
}
{
"citedIn": [
{
"someFiled": "15 C. B 400"
},
{
"someFiled": "Contract Act, 1872 - Section 55"
}
]
}
{
"citedIn": [
{
"someFiled": "15 C. B 400"
},
{
"someFiled": "Contract Act, 1872 - Section 15"
}
]
}
搜索查询:
{
"query": {
"nested": {
"path": "citedIn",
"query": {
"bool": {
"must": [
{
"match": {
"citedIn.someFiled": "contract"
}
},
{
"match": {
"citedIn.someFiled": "act"
}
},
{
"match": {
"citedIn.someFiled": 15
}
}
]
}
},
"inner_hits": {}
}
}
}
搜索结果:
"inner_hits": {
"citedIn": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.620718,
"hits": [
{
"_index": "stof_64170705",
"_type": "_doc",
"_id": "3",
"_nested": {
"field": "citedIn",
"offset": 1
},
"_score": 1.620718,
"_source": {
"someFiled": "Contract Act, 1872 - Section 15"
}
}
]
}
}
}
}
假设我有 3 个文档
doc_1 = {
"citedIn": [
"Bar Councils Act, 1926 - Section 15",
"Contract Act, 1872 - Section 23"
]
}
doc_2 = {
"citedIn":[
"15 C. B 400",
"Contract Act, 1872 - Section 55"
]
}
doc_3 = {
"citedIn":[
"15 C. B 400",
"Contract Act, 1872 - Section 15"
]
}
这里citedIn
字段是一个数组object.Now我要运行一个standermatch
查询
{
"query":
{
"match": {"citedIn":{"query": "Contract act 15" , "operator":"and" }}
}
}
上面的查询 return 所有的 3 文档,但它假设 return doc_3
因为只有 doc_3
包含 Contract
, act
和 15
放在一个数组元素中。
我该如何实现?
任何 suggestion/Solution 将是可取的
嵌套数据类型更新:
我试过嵌套字段。 这是我的映射
{
"mappings": {
"properties": {
"citedIn": {
"type": "nested",
"include_in_parent": true,
"properties": {
"someFiled": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}
这是我的数据
doc_1 = {
"citedIn": [
{"someFiled" : "Bar Councils Act, 1926 - Section 15"},
{"someFiled" : "Contract Act, 1872 - Section 23"}
]
}
doc_2 = {
"citedIn":[
{"someFiled" : "15 C. B 400"}
{"someFiled" : "Contract Act, 1872 - Section 55"}
]
}
doc_3 = {
"citedIn":[
{"someFiled" : "15 C. B 400"},
{"someFiled" : "Contract Act, 1872 - Section 15"}
]
}
这是我的查询
{
"query":
{
"match": {"citedIn.someFiled":{"query": "Contract act 15" , "operator":"and" }}
}
}
但仍然得到相同的结果
你无法实现这一点,因为你正在索引的是 citedIn
字段中的字符串数组,并且所有 Elasticsearch 字段在设计时默认为 multi-valued在 Lucene 中,elasticsearch 建立在 Lucene search library.
请阅读 arrays in elasticsearch 了解更多信息,尤其是下图所示的最后一条重要说明:
如上图所示,数组中的所有字符串实际上属于同一字段,因此 ES 无法识别您的搜索字符串是否属于数组中的同一字符串,因此您在搜索中获得了所有文档。
除非您将这些字符串作为其他字段(例如 nested
字段)的一部分进行索引,但为此您需要提供字段名称,它就像一个映射,其中键是您的字段名称,值是字段值而不是查询字段名称,您将无法实现 use-case.
添加包含索引数据、映射、搜索查询和搜索结果的工作示例。
您需要使用 nested query 来搜索嵌套字段
索引映射
{
"mappings": {
"properties": {
"citedIn": {
"type": "nested"
}
}
}
}
索引数据:
{
"citedIn": [
{
"someFiled": "Bar Councils Act, 1926 - Section 15"
},
{
"someFiled": "Contract Act, 1872 - Section 23"
}
]
}
{
"citedIn": [
{
"someFiled": "15 C. B 400"
},
{
"someFiled": "Contract Act, 1872 - Section 55"
}
]
}
{
"citedIn": [
{
"someFiled": "15 C. B 400"
},
{
"someFiled": "Contract Act, 1872 - Section 15"
}
]
}
搜索查询:
{
"query": {
"nested": {
"path": "citedIn",
"query": {
"bool": {
"must": [
{
"match": {
"citedIn.someFiled": "contract"
}
},
{
"match": {
"citedIn.someFiled": "act"
}
},
{
"match": {
"citedIn.someFiled": 15
}
}
]
}
},
"inner_hits": {}
}
}
}
搜索结果:
"inner_hits": {
"citedIn": {
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1.620718,
"hits": [
{
"_index": "stof_64170705",
"_type": "_doc",
"_id": "3",
"_nested": {
"field": "citedIn",
"offset": 1
},
"_score": 1.620718,
"_source": {
"someFiled": "Contract Act, 1872 - Section 15"
}
}
]
}
}
}
}