为 cross_fields 构建具有模糊性的有效 Elasticsearch 查询
Building an effective Elasticsearch query for cross_fields with fuzziness
我知道 Elasticsearch 不支持 multi_match
查询中 cross_fields
类型的模糊性。我在使用 Elasticsearch API 时遇到了很多困难,因此我发现构建一个类似的查询具有挑战性,该查询使用模糊字符串匹配在多个文档字段中进行搜索。
我有一个名为 papers
的索引,其中包含各种字段,例如 Title
、Author.FirstName
、Author.LastName
、PublicationDate
、Journal
等...我希望能够使用 "John Doe paper title 2015 journal name" 之类的字符串进行查询。 cross_fields
是完美的 multi_match
类型,但它不支持对我的应用程序至关重要的模糊性。
任何人都可以提出一个合理的方法来解决这个问题吗?我花了几个小时在 SO 和 Elasticsearch 论坛上浏览解决方案,但收效甚微。
你可以利用copy_to
field for this scenario. Basically you are copying all the values from different fields into one new field (my_search_field
in the below details) and on this field, you would be able to perform fuzzy query via fuzziness
parameter using simple match query。
下面是示例映射、文档和查询:
映射:
PUT my_fuzzy_index
{
"mappings": {
"properties": {
"my_search_field":{ <---- Note this field
"type": "text"
},
"Title":{
"type": "text",
"copy_to": "my_search_field" <---- Note this
},
"Author":{
"type": "nested",
"properties": {
"FirstName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
},
"LastName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
},
"PublicationDate":{
"type": "date",
"copy_to": "my_search_field" <---- Note this
},
"Journal":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
}
}
示例文档:
POST my_fuzzy_index/_doc/1
{
"Title": "Fountainhead",
"Author":[
{
"FirstName": "Ayn",
"LastName": "Rand"
}
],
"PublicationDate": "2015",
"Journal": "journal"
}
查询请求:
POST my_fuzzy_index/_search
{
"query": {
"match": {
"my_search_field": { <---- Note this field
"query": "Aynnn Ranaad Fountainhead 2015 journal",
"fuzziness": 3 <---- Fuzzy parameter
}
}
}
}
回复:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.1027813,
"hits" : [
{
"_index" : "my_fuzzy_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1027813,
"_source" : {
"Title" : "Fountainhead",
"Author" : [
{
"FirstName" : "Ayn",
"LastName" : "Rand"
}
],
"PublicationDate" : "2015",
"Journal" : "journal"
}
}
]
}
}
因此,与其考虑对多个字段应用模糊查询,不如采用这种方法。这样你的查询就会被简化。
如果有帮助请告诉我!
我知道 Elasticsearch 不支持 multi_match
查询中 cross_fields
类型的模糊性。我在使用 Elasticsearch API 时遇到了很多困难,因此我发现构建一个类似的查询具有挑战性,该查询使用模糊字符串匹配在多个文档字段中进行搜索。
我有一个名为 papers
的索引,其中包含各种字段,例如 Title
、Author.FirstName
、Author.LastName
、PublicationDate
、Journal
等...我希望能够使用 "John Doe paper title 2015 journal name" 之类的字符串进行查询。 cross_fields
是完美的 multi_match
类型,但它不支持对我的应用程序至关重要的模糊性。
任何人都可以提出一个合理的方法来解决这个问题吗?我花了几个小时在 SO 和 Elasticsearch 论坛上浏览解决方案,但收效甚微。
你可以利用copy_to
field for this scenario. Basically you are copying all the values from different fields into one new field (my_search_field
in the below details) and on this field, you would be able to perform fuzzy query via fuzziness
parameter using simple match query。
下面是示例映射、文档和查询:
映射:
PUT my_fuzzy_index
{
"mappings": {
"properties": {
"my_search_field":{ <---- Note this field
"type": "text"
},
"Title":{
"type": "text",
"copy_to": "my_search_field" <---- Note this
},
"Author":{
"type": "nested",
"properties": {
"FirstName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
},
"LastName":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
},
"PublicationDate":{
"type": "date",
"copy_to": "my_search_field" <---- Note this
},
"Journal":{
"type":"text",
"copy_to": "my_search_field" <---- Note this
}
}
}
}
示例文档:
POST my_fuzzy_index/_doc/1
{
"Title": "Fountainhead",
"Author":[
{
"FirstName": "Ayn",
"LastName": "Rand"
}
],
"PublicationDate": "2015",
"Journal": "journal"
}
查询请求:
POST my_fuzzy_index/_search
{
"query": {
"match": {
"my_search_field": { <---- Note this field
"query": "Aynnn Ranaad Fountainhead 2015 journal",
"fuzziness": 3 <---- Fuzzy parameter
}
}
}
}
回复:
{
"took" : 15,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.1027813,
"hits" : [
{
"_index" : "my_fuzzy_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.1027813,
"_source" : {
"Title" : "Fountainhead",
"Author" : [
{
"FirstName" : "Ayn",
"LastName" : "Rand"
}
],
"PublicationDate" : "2015",
"Journal" : "journal"
}
}
]
}
}
因此,与其考虑对多个字段应用模糊查询,不如采用这种方法。这样你的查询就会被简化。
如果有帮助请告诉我!