弹性搜索中匹配短语查询中的单个单词是否有字符限制?

Is there a character limit on an individual word within a match phrase query in elastic search?

对 Elastic Search 还很陌生,所以可能不得不告诉我,我 运行 遇到了一个问题,如果我使用 20 个或更少的字符搜索文档,该文档会出现,但是任何更多的字符在查询中的同一个词中,我没有得到任何结果:

这是我正在尝试使用的查询:

{
    "match_phrase": {
        "genericNames.name": {
        "query": "phenoxymethylpenicillin",
        "slop": 15,
        "zero_terms_query": "NONE",
        "boost": 1.0
        }
    }
}

这是完整的查询:https://pastebin.com/DEJvP2uS

就像我说的,我对此还很陌生,可能是看错了地方。

所以我的问题是,哪些可能的区域会导致这种情况,为什么?

谢谢!

编辑: 提供的是来自示例数据的文档之一的摘录。我不能展示很多,因为很多都是敏感的,幸运的是,我可以分享样本数据中的名字。这是我要搜索的数据:

"genericNames":[
{
    "nameType":1,
    "name":"Phenoxymethylpenicillin 250mg tablets",
    "nameChangeCode":"0000",
    "nameBasisCode":"0001",
    "nameTypeDescription":"Name",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
},
{
    "nameType":5,
    "name":"Penicillin V 250mg tablets",
    "nameTypeDescription":"Alternative Name 3",
    "startDate":"1948-01-01T00:00:00.000000+0000",
    "endDate":"3456-02-01T00:00:00.000000+0000"
}
],

我还提供了索引映射,因为它可能会提供额外信息:

{
    "amp": {
        "mappings": {
            "properties": {
                "_class": {
                    "type": "text",
                    "fields": {
                        "keyword": {
                            "type": "keyword",
                            "ignore_above": 256
                        }
                    }
                },
                "ampId": {
                    "type": "long"
                },
                "amppId": {
                    "type": "long"
                },
                "attributes": {
                    "type": "nested",
                    "properties": {
                        "attributeQualifier": {
                            "type": "keyword"
                        },
                        "attributeType": {
                            "type": "integer"
                        },
                        "attributeTypeDescription": {
                            "type": "keyword"
                        },
                        "attributeValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "countryId": {
                            "type": "long"
                        },
                        "decodedValue": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "dictionaries": {
                    "type": "nested",
                    "properties": {
                        "abbreviation": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "description": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "dictId": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "endDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "excipients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "extractableEntry": {
                    "type": "boolean"
                },
                "genericNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "id": {
                    "type": "keyword"
                },
                "ingredients": {
                    "type": "nested",
                    "properties": {
                        "basisOfStrengthCode": {
                            "type": "keyword"
                        },
                        "bossId": {
                            "type": "long"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "ingredientNames": {
                            "properties": {
                                "endDate": {
                                    "type": "date"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "keyword": {
                                            "type": "keyword",
                                            "ignore_above": 256
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "strengthDenominatorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthDenominatorValue": {
                            "type": "keyword"
                        },
                        "strengthNumeratorUnitOfMeasureCode": {
                            "type": "keyword"
                        },
                        "strengthNumeratorValue": {
                            "type": "keyword"
                        },
                        "strengthVal": {
                            "type": "keyword"
                        },
                        "unitOfMeasure": {
                            "type": "keyword"
                        }
                    }
                },
                "invalidEntry": {
                    "type": "boolean"
                },
                "pitId": {
                    "type": "integer"
                },
                "ppaCodes": {
                    "type": "nested",
                    "properties": {
                        "code": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "proprietaryNames": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "name": {
                            "type": "text",
                            "ignore_above": 256,
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            },
                            "analyzer": "autocomplete_index",
                            "search_analyzer": "autocomplete_search"
                        },
                        "nameBasisCode": {
                            "type": "keyword"
                        },
                        "nameChangeCode": {
                            "type": "keyword"
                        },
                        "nameType": {
                            "type": "integer"
                        },
                        "nameTypeDescription": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "qpuUomCde": {
                    "type": "keyword"
                },
                "qpuVal": {
                    "type": "keyword"
                },
                "qtyUomCde": {
                    "type": "keyword"
                },
                "qtyVal": {
                    "type": "keyword"
                },
                "snomedCodes": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "snomedDescriptions": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "ppaNextNo": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "snomed": {
                            "type": "text",
                            "fields": {
                                "raw": {
                                    "type": "keyword"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "startDate": {
                    "type": "date",
                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                },
                "suppliers": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "names": {
                            "type": "nested",
                            "properties": {
                                "endDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                },
                                "name": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    },
                                    "analyzer": "autocomplete_index",
                                    "search_analyzer": "autocomplete_search"
                                },
                                "nameBasisCode": {
                                    "type": "keyword"
                                },
                                "nameChangeCode": {
                                    "type": "keyword"
                                },
                                "nameType": {
                                    "type": "integer"
                                },
                                "nameTypeDescription": {
                                    "type": "text",
                                    "fields": {
                                        "raw": {
                                            "type": "keyword"
                                        }
                                    }
                                },
                                "startDate": {
                                    "type": "date",
                                    "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                                }
                            }
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                },
                "udfs": {
                    "type": "nested",
                    "properties": {
                        "ddIndicator": {
                            "type": "integer"
                        },
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "udfsUomCode": {
                            "type": "keyword"
                        },
                        "udfsValue": {
                            "type": "keyword"
                        },
                        "vmpUomCode": {
                            "type": "keyword"
                        }
                    }
                },
                "vmpId": {
                    "type": "long"
                },
                "vmppId": {
                    "type": "long"
                },
                "vtms": {
                    "type": "nested",
                    "properties": {
                        "endDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        },
                        "id": {
                            "type": "long"
                        },
                        "startDate": {
                            "type": "date",
                            "format": "uuuu-MM-dd'T'HH:mm:ss.SSSSSSZ"
                        }
                    }
                }
            }
        }
    }
}

编辑:将 link 添加到完整查询 - https://pastebin.com/DEJvP2uS

编辑:索引设置:

{
    "index": {
        "max_ngram_diff": "20",
        "analysis": {
            "filter": {
                "autocomplete_suffix_filter": {
                    "type": "ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                },
                "autocomplete_filter": {
                    "type": "edge_ngram",
                    "min_gram": "1",
                    "max_gram": "20"
                }
            },
            "analyzer": {
                "autocomplete_index": {
                    "filter": [
                        "lowercase",
                        "autocomplete_filter",
                        "autocomplete_suffix_filter"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                },
                "autocomplete_search": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom",
                    "tokenizer": "standard"
                }
            }
        },
        "number_of_replicas": "1"
    }
}

上面提供的索引映射中,genericNames属于嵌套类型,所以需要使用nested query

使用上面提供的相同索引数据以及搜索查询和搜索结果添加工作示例。

搜索查询:

{
  "query": {
    "nested": {
      "path": "genericNames",
      "query": {
        "bool": {
          "must": [
            {
              "match": {
                "genericNames.name": "phenoxymethylpenicillin"
              }
            }
          ]
        }
      },
      "inner_hits":{}
    }
  }
}

搜索结果:

"hits": [
                {
                  "_index": "64817981",
                  "_type": "_doc",
                  "_id": "1",
                  "_nested": {
                    "field": "genericNames",
                    "offset": 0
                  },
                  "_score": 0.7361701,
                  "_source": {
                    "nameType": 1,
                    "name": "Phenoxymethylpenicillin 250mg tablets",
                    "nameChangeCode": "0000",
                    "nameBasisCode": "0001",
                    "nameTypeDescription": "Name",
                    "startDate": "1948-01-01T00:00:00.000000+0000",
                    "endDate": "3456-02-01T00:00:00.000000+0000"
                  }
                }
              ]

这一定是由于您在 genericNames.name 字段上的自定义分析器造成的,您有不同的自定义分析器,您使用 autocomplete_index 的索引时间和搜索时间 autocomplete_search分析器,但是问题中没有提供这些分析器的定义,只提供了mapping部分。

请在您的索引中提供 _setting API 的输出,请参阅 https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-get-settings.html 了解更多信息。

您需要使用 analyze APIautocomplete_indexautocomplete_search 分析器检查为 phenoxymethylpenicillin 生成的令牌,您会注意到差异。