Elasticsearch 查询对所有文档评分 1.0。为什么?

Elasticsearch query scores all documents 1.0. Why?

我正在使用 ElasticSearch 2.4.1。当我执行以下查询时,所有文档的得分都是 1.0。为什么?

如果我删除 "bool" 并只在一个字段上进行匹配,我会得到相同的行为。

查询:

{
    "query" : 
    {   
        "bool": {
            "must" : [
                {"match" : { "last" : { "query" :  "SMITH" , fuzziness: 2.0}} }
            ],
        "should" : [
            {"match" : {"first" :{ "query" :  "JOE", fuzziness: 1.0, boost: 99.0}}}
            ]
        }
    }
}

解释一场比赛给我:

1.0 = sum of:
  1.0 = ConstantScore(+(last:1mith^0.8 last:1smith^0.8 last:4mith^0.8 last:amith^0.8 last:asmith^0.8 last:bsmith^0.8 last:csmith^0.8 last:dsmith^0.8 last:emith^0.8 last:esmith^0.8 last:fsmith^0.8 last:hmith^0.8 last:hsmith^0.8 last:imith^0.8 last:ismith^0.8 last:jmith^0.8 last:jsmith^0.8 last:ksmith^0.8 last:lsmith^0.8 last:msith^0.8 last:msmith^0.8 last:nsmith^0.8 last:omith^0.8 last:osmith^0.8 last:psmith^0.8 last:qsmith^0.8 last:rsmith^0.8 last:saith^0.8 last:samith^0.8 last:scmith^0.8 last:seith^0.8 last:shith^0.8 last:simith^0.8 last:simth^0.8 last:skith^0.8 last:slith^0.8 last:smaith^0.8 last:smath^0.8 last:smdith^0.8 last:smeth^0.8 last:smfith^0.8 last:smich^0.8 last:smidh^0.8 last:smidth^0.8 last:smieth^0.8 last:smigh^0.8 last:smiht^0.8 last:smiih^0.8 last:smiith^0.8 last:smith) (first:aoe^0.6666666 first:bjoe^0.6666666 first:boe^0.6666666 first:coe^0.6666666 first:djoe^0.6666666 first:doe^0.6666666 first:eoe^0.6666666 first:foe^0.6666666 first:goe^0.6666666 first:hoe^0.6666666 first:ioe^0.6666666 first:j0e^0.6666666 first:jae^0.6666666 first:jbe^0.6666666 first:jce^0.6666666 first:jee^0.6666666 first:jeo^0.6666666 first:jge^0.6666666 first:jhe^0.6666666 first:jhoe^0.6666666 first:jie^0.6666666 first:jioe^0.6666666 first:jke^0.6666666 first:jle^0.6666666 first:jme^0.6666666 first:jne^0.6666666 first:jnoe^0.6666666 first:joa^0.6666666 first:joae^0.6666666 first:job^0.6666666 first:jobe^0.6666666 first:joc^0.6666666 first:joce^0.6666666 first:jod^0.6666666 first:jode^0.6666666 first:joe first:joea^0.6666666 first:joeb^0.6666666 first:joec^0.6666666 first:joed^0.6666666 first:joee^0.6666666 first:joef^0.6666666 first:joeg^0.6666666 first:joeh^0.6666666 first:joei^0.6666666 first:joej^0.6666666 first:joek^0.6666666 first:joel^0.6666666 first:joem^0.6666666 first:joen^0.6666666)^99.0), product of:
    1.0 = boost
    1.0 = queryNorm
  0.0 = match on required clause, product of:
0.0 = # clause
0.0 = weight(_type:mytype in 327) [], result of:
  0.0 = score(doc=327,freq=1.0), with freq of:
    1.0 = termFreq=1.0

类型映射:

{
  "ourindex1": {
    "mappings": {
      "people": {
        "properties": {        
          "city": {
            "type": "string"
          },
          "first": {
            "type": "string"
          },
          "last": {
            "type": "string"
          },
          "middle": {
            "type": "string"
          },
         "state": {
            "type": "string"
          },
          "street": {
            "type": "string"
          },
          "suffix": {
            "type": "string"
          },
          "suite": {
            "type": "string"
          },
          "territory": {
            "type": "string"
          },
          "zip5": {
            "type": "string"
          }
        }
      }
    }
  }
}

编辑:简化复制:

  1. 下载 elasticsearch 2.4.1 的干净版本并启动它
  2. 创建新索引:

    POST /newindex/people

    {"first":"JOE","last":"SMITH","street":“第一街 1 号”,"city":"LOS ANGELES", "state" : "CA", "middle" : ""}

  3. 发出以下查询:

    { "query" : {"match" : { "last" : { "query" : "SMITHX", 模糊度: 1.0} } }}

当我执行此操作时,返回的文档得分为 1.0,并且解释说明了有关 ConstantScore 的内容。

编辑 2:看来我的复制步骤包含一个无意的谎言

我的应用程序用来与 elasticsearch (elastic4s) 通信的库似乎破坏了查询,使其变为:

{"query" : { "query" : {"match" : { "last" : { "query" : "SMITHX", fuzziness: 1.0} } }}}

(请注意额外的 "query." 这个损坏的查询 returns 我期望的结果,但分数 = 1.0。)我以为我已经尝试过直接使用 curl 执行查询,但是显然不是。

这是因为双 query 关键字。所以,基本上它是这样工作的——内部 query 选择命中并产生这样的东西:

{
    "took": 7,
    "timed_out": false,
    "_shards": {
        "total": 5,
        "successful": 5,
        "failed": 0
    },
    "hits": {
        "total": 2,
        "max_score": 0.30685285,
        "hits": [
            {
                "_index": "my_index",
                "_type": "people",
                "_id": "2",
                "_score": 0.30685285,
                "_source": {
                    "first": "JOHN",
                    "last": "SMITHS",
                    "street": "2 SECOND STREET",
                    "city": "LA",
                    "state": "CA",
                    "middle": ""
                }
            },
            {
                "_index": "my_index",
                "_type": "people",
                "_id": "1",
                "_score": 0.30685282,
                "_source": {
                    "first": "JOE",
                    "last": "SMITH",
                    "street": "1 FIRST STREET",
                    "city": "LOS ANGELES",
                    "state": "CA",
                    "middle": ""
                }
            }
        ]
    }
}

这是具有正确分数的完全正确的响应,但是随后出现第二个 query,它没有改变结果集,只是 "eat" 分数和替换它与 1.0.因此,您需要修复 elastic4s

的用法