复合 Elasticsearch 过滤器中的空 inner_hits

Empty inner_hits in compound Elasticsearch filter

我在嵌套布尔查询的 inner_hits 结果中看到了异常行为。

测试数据(为简洁起见缩写):

# MAPPING
PUT unit_testing
{
    "mappings": {
        "document": {
            "properties": {
                "display_name": {"type": "text"},
                "metadata": {
                    "properties": {
                        "NAME": {"type": "text"}
                    }
                }
            }
        },
        "paragraph": {
            "_parent": {"type": "document"},
            "_routing": {"required": true},
            "properties": {
                "checksum": {"type": "text"},
                "sentences": {
                    "type": "nested",
                    "properties": {
                        "text": {"type": "text"}
                    }
                }
            }
        }
    }
}

# DOCUMENT X 2 (d0, d1)
PUT unit_testing/document/doc_id_d0
{
    "display_name": "Test Document d0",
    "paragraphs": [
        "para_id_d0p0",
        "para_id_d0p1"
    ],
    "metadata": {"NAME": "Test Document d0 Metadata"}
}

# PARAGRAPH X 2 (d0p0, d1p0)
PUT unit_testing/paragraph/para_id_d0p0?parent=doc_id_d0
{
    "checksum": "para_checksum_d0p0",
    "sentences": [
        {"text": "Test sentence d0p0s0"},
        {"text": "Test sentence d0p0s1 ODD"},
        {"text": "Test sentence d0p0s2 EVEN"},
        {"text": "Test sentence d0p0s3 ODD"},
        {"text": "Test sentence d0p0s4 EVEN"}
    ]
}

这个初始查询的行为符合我的预期(我知道在此示例中实际上不需要元数据过滤器):

GET unit_testing/paragraph/_search
{
    "_source": "false", 
    "query": {
        "bool": {
            "must": [
                {
                    "has_parent": {
                        "query": {
                            "match_phrase": {
                                "metadata.NAME": "Test Document d0 Metadata"
                            }
                        }, 
                        "type": "document"
                    }
                }, 
                {
                    "nested": {
                        "inner_hits": {}, 
                        "path": "sentences", 
                        "query": {
                            "match": {
                                "sentences.text": "d0p0s0"
                            }
                        }
                    }
                }
            ]
        }
    }
}

它生成一个 inner_hits 对象,其中包含与谓词匹配的一个句子(为清楚起见删除了一些字段):

{
  "hits": {
    "hits": [
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "text": "Test sentence d0p0s0"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

以下查询尝试将上述查询嵌入到父 "should" 子句中,以在初始查询与匹配单个句子的附加查询之间创建逻辑或:

GET unit_testing/paragraph/_search
{
    "_source": "false", 
    "query": {
        "bool": {
            "should": [
                {
                    "bool": {
                        "must": [
                            {
                                "has_parent": {
                                    "query": {
                                        "match_phrase": {
                                            "metadata.NAME": "Test Document d0 Metadata"
                                        }
                                    }, 
                                    "type": "document"
                                }
                            }, 
                            {
                                "nested": {
                                    "inner_hits": {}, 
                                    "path": "sentences", 
                                    "query": {
                                        "match": {
                                            "sentences.text": "d0p0s0"
                                        }
                                    }
                                }
                            }
                        ]
                    }
                }, 
                {
                    "nested": {
                        "inner_hits": {}, 
                        "path": "sentences", 
                        "query": {
                            "match": {
                                "sentences.text": "d1p0s0"
                            }
                        }
                    }
                }
            ]
        }
    }
}

虽然 "d1" 查询输出了人们期望的结果,但 inner_hits 对象包含匹配的句子,原始 "d0" 查询现在产生一个空的 inner_hits对象:

{
  "hits": {
    "hits": [
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "total": 0,
              "hits": []
            }
          }
        }
      },
      {
        "_source": {},
        "inner_hits": {
          "sentences": {
            "hits": {
              "hits": [
                {
                  "_source": {
                    "text": "Test sentence d1p0s0"
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

尽管我正在使用 elasticsearch_dsl Python 库来构建和组合这些查询,而且我在查询 DSL 方面还是个新手,但查询格式看起来很可靠我.

我错过了什么?

我认为缺少的是 inner_hitsname 参数 - 您在两个不同的查询中有两个 inner_hits 子句,它们最终会以相同的名称结束。尝试给 inner_hits 一个 name 参数 (0).

0 - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-inner-hits.html#_options