模糊查询不适用于文本类型,但适用于关键字类型
Fuzzy query doesn't work on text type, but works on keyword type
我有一个仅适用于关键字类型的查询,但我不明白为什么。
但是,如果我使用匹配查询加上模糊参数,我可以让它与文本类型一起工作。
为什么会这样?
请参阅下面的查询
(工作查询应该 return Eddie 的文档。)
1) 模糊查询文本类型 -> 不工作
GET kibana_sample_data_ecommerce/_search
{
"query": {
"fuzzy": {
"customer_first_name": {
"value": "Eddi",
"fuzziness": "AUTO"
}
}
}
}
2) 模糊查询关键字类型 - 有效
GET kibana_sample_data_ecommerce/_search
{
"query": {
"fuzzy": {
"customer_first_name.keyword": {
"value": "Eddi",
"fuzziness": "AUTO"
}
}
}
}
3) 匹配查询 + 模糊性 -> 工作
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match": {
"customer_first_name.keyword": {
"query": "Eddi",
"fuzziness": "Auto"
}
}
}
}
索引设置
{
"kibana_sample_data_ecommerce" : {
"aliases" : { },
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"currency" : {
"type" : "keyword"
},
"customer_birth_date" : {
"type" : "date"
},
"customer_first_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_full_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_gender" : {
"type" : "keyword"
},
"customer_id" : {
"type" : "keyword"
},
"customer_last_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_phone" : {
"type" : "keyword"
},
"day_of_week" : {
"type" : "keyword"
},
"day_of_week_i" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"geoip" : {
"properties" : {
"city_name" : {
"type" : "keyword"
},
"continent_name" : {
"type" : "keyword"
},
"country_iso_code" : {
"type" : "keyword"
},
"location" : {
"type" : "geo_point"
},
"region_name" : {
"type" : "keyword"
}
}
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"order_date" : {
"type" : "date"
},
"order_id" : {
"type" : "keyword"
},
"products" : {
"properties" : {
"_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"base_price" : {
"type" : "half_float"
},
"base_unit_price" : {
"type" : "half_float"
},
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"created_on" : {
"type" : "date"
},
"discount_amount" : {
"type" : "half_float"
},
"discount_percentage" : {
"type" : "half_float"
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"min_price" : {
"type" : "half_float"
},
"price" : {
"type" : "half_float"
},
"product_id" : {
"type" : "long"
},
"product_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "english"
},
"quantity" : {
"type" : "integer"
},
"sku" : {
"type" : "keyword"
},
"tax_amount" : {
"type" : "half_float"
},
"taxful_price" : {
"type" : "half_float"
},
"taxless_price" : {
"type" : "half_float"
},
"unit_discount_amount" : {
"type" : "half_float"
}
}
},
"sku" : {
"type" : "keyword"
},
"taxful_total_price" : {
"type" : "half_float"
},
"taxless_total_price" : {
"type" : "half_float"
},
"total_quantity" : {
"type" : "integer"
},
"total_unique_products" : {
"type" : "integer"
},
"type" : {
"type" : "keyword"
},
"user" : {
"type" : "keyword"
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-1",
"provided_name" : "kibana_sample_data_ecommerce",
"creation_date" : "1579684918696",
"number_of_replicas" : "0",
"uuid" : "Ga3UfyyAQjGpa5JDbJB7Sw",
"version" : {
"created" : "7050299"
}
}
}
}
}
模糊查询是词级查询。这意味着不分析查询。
为什么查询 1) 失败
如果您查询 "Eddi",它将与分析的文本进行比较,在本例中为 "eddie"。
从 'Eddi' 到 'eddie' 是 2 个编辑距离。
因此查询不会成功,因为长度在 3 到 5 之间的术语的最大编辑距离为 1(使用 "fuzziness: AUTO" 配置)
为什么查询 2) 成功
另一方面,如果您使用关键字,关键字会在不经过分析的情况下存储。因此。 Eddi 与 Eddie 的编辑距离为 1。
我有一个仅适用于关键字类型的查询,但我不明白为什么。
但是,如果我使用匹配查询加上模糊参数,我可以让它与文本类型一起工作。
为什么会这样?
请参阅下面的查询
(工作查询应该 return Eddie 的文档。)
1) 模糊查询文本类型 -> 不工作
GET kibana_sample_data_ecommerce/_search
{
"query": {
"fuzzy": {
"customer_first_name": {
"value": "Eddi",
"fuzziness": "AUTO"
}
}
}
}
2) 模糊查询关键字类型 - 有效
GET kibana_sample_data_ecommerce/_search
{
"query": {
"fuzzy": {
"customer_first_name.keyword": {
"value": "Eddi",
"fuzziness": "AUTO"
}
}
}
}
3) 匹配查询 + 模糊性 -> 工作
GET kibana_sample_data_ecommerce/_search
{
"query": {
"match": {
"customer_first_name.keyword": {
"query": "Eddi",
"fuzziness": "Auto"
}
}
}
}
索引设置
{
"kibana_sample_data_ecommerce" : {
"aliases" : { },
"mappings" : {
"properties" : {
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"currency" : {
"type" : "keyword"
},
"customer_birth_date" : {
"type" : "date"
},
"customer_first_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_full_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_gender" : {
"type" : "keyword"
},
"customer_id" : {
"type" : "keyword"
},
"customer_last_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"customer_phone" : {
"type" : "keyword"
},
"day_of_week" : {
"type" : "keyword"
},
"day_of_week_i" : {
"type" : "integer"
},
"email" : {
"type" : "keyword"
},
"geoip" : {
"properties" : {
"city_name" : {
"type" : "keyword"
},
"continent_name" : {
"type" : "keyword"
},
"country_iso_code" : {
"type" : "keyword"
},
"location" : {
"type" : "geo_point"
},
"region_name" : {
"type" : "keyword"
}
}
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"order_date" : {
"type" : "date"
},
"order_id" : {
"type" : "keyword"
},
"products" : {
"properties" : {
"_id" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
},
"base_price" : {
"type" : "half_float"
},
"base_unit_price" : {
"type" : "half_float"
},
"category" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"created_on" : {
"type" : "date"
},
"discount_amount" : {
"type" : "half_float"
},
"discount_percentage" : {
"type" : "half_float"
},
"manufacturer" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
}
},
"min_price" : {
"type" : "half_float"
},
"price" : {
"type" : "half_float"
},
"product_id" : {
"type" : "long"
},
"product_name" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword"
}
},
"analyzer" : "english"
},
"quantity" : {
"type" : "integer"
},
"sku" : {
"type" : "keyword"
},
"tax_amount" : {
"type" : "half_float"
},
"taxful_price" : {
"type" : "half_float"
},
"taxless_price" : {
"type" : "half_float"
},
"unit_discount_amount" : {
"type" : "half_float"
}
}
},
"sku" : {
"type" : "keyword"
},
"taxful_total_price" : {
"type" : "half_float"
},
"taxless_total_price" : {
"type" : "half_float"
},
"total_quantity" : {
"type" : "integer"
},
"total_unique_products" : {
"type" : "integer"
},
"type" : {
"type" : "keyword"
},
"user" : {
"type" : "keyword"
}
}
},
"settings" : {
"index" : {
"number_of_shards" : "1",
"auto_expand_replicas" : "0-1",
"provided_name" : "kibana_sample_data_ecommerce",
"creation_date" : "1579684918696",
"number_of_replicas" : "0",
"uuid" : "Ga3UfyyAQjGpa5JDbJB7Sw",
"version" : {
"created" : "7050299"
}
}
}
}
}
模糊查询是词级查询。这意味着不分析查询。
为什么查询 1) 失败
如果您查询 "Eddi",它将与分析的文本进行比较,在本例中为 "eddie"。
从 'Eddi' 到 'eddie' 是 2 个编辑距离。
因此查询不会成功,因为长度在 3 到 5 之间的术语的最大编辑距离为 1(使用 "fuzziness: AUTO" 配置)
为什么查询 2) 成功
另一方面,如果您使用关键字,关键字会在不经过分析的情况下存储。因此。 Eddi 与 Eddie 的编辑距离为 1。