Elasticsearch 多个值在没有分析器的情况下匹配
Elasticsearch multiple values match without analyzer
请原谅我对 ElasticSearch 的了解。我有一个 Elasticsearch 集合,其中包含如下文档:
{
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
这 3 个 json 文档已在 ES 服务器中建立索引。我没有提供任何分析器类型(也不知道如何提供一个 :))
我正在使用 spring 数据 Elasticsearch 并执行以下查询来搜索区域为 'Masovian Voivodeship' 或 'Federal District':
的文档
{
"query_string" : {
"query" : "Masovian Voivodeship OR Federal District",
"fields" : [ "dimensions.region" ]
}
}
我希望它能获得 return 2 次点击。但是,它 return 包含所有 3 个文档(可能是因为第 3 个文档中包含地区)。我如何修改查询以便它可以执行完全匹配并且只提供 2 个文档?我正在使用以下方法:
QueryBuilders.queryString(<OR string>).field("dimensions.region")
我尝试了 QueryBuilders.termsQuery
、QueryBuilders.inQuery
和 QueryBuilders.matchQuery
(带数组),但没有成功。
有人可以帮忙吗?提前致谢。
您可以在这里做几件事。
首先,我设置了一个没有任何显式映射或分析的索引,这意味着将使用 standard analyzer。这很重要,因为它决定了我们如何查询文本字段。
所以我开始:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
PUT /test_index/doc/1
{
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
PUT /test_index/doc/2
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
PUT /test_index/doc/3
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
然后我尝试了您的查询,但没有任何结果。我不明白为什么你的 fields
参数中有 "dimensions.ga:region"
,但是当我将其更改为 "dimensions.region"
时,我得到了一些结果:
POST /test_index/doc/_search
{
"query": {
"query_string": {
"query": "Masovian Voivodeship OR Federal District",
"fields": [
"dimensions.region"
]
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.46911472,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.46911472,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.3533006,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.05937162,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
}
]
}
}
然而,这returns是你不想要的结果。一种解决方法如下:
POST /test_index/doc/_search
{
"query": {
"query_string": {
"query": "(Masovian AND Voivodeship) OR (Federal AND District)",
"fields": [
"dimensions.region"
]
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.46911472,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.46911472,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.3533006,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
}
]
}
}
另一种方法(我更喜欢这个)给出相同的结果是使用 match query and boolean should:
的组合
POST /test_index/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"dimensions.region": {
"query": "Masovian Voivodeship",
"operator": "and"
}
}
},
{
"match": {
"dimensions.region": {
"query": "Federal District",
"operator": "and"
}
}
}
]
}
}
}
这是我使用的代码:
http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51
请原谅我对 ElasticSearch 的了解。我有一个 Elasticsearch 集合,其中包含如下文档:
{
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
这 3 个 json 文档已在 ES 服务器中建立索引。我没有提供任何分析器类型(也不知道如何提供一个 :)) 我正在使用 spring 数据 Elasticsearch 并执行以下查询来搜索区域为 'Masovian Voivodeship' 或 'Federal District':
的文档{
"query_string" : {
"query" : "Masovian Voivodeship OR Federal District",
"fields" : [ "dimensions.region" ]
}
}
我希望它能获得 return 2 次点击。但是,它 return 包含所有 3 个文档(可能是因为第 3 个文档中包含地区)。我如何修改查询以便它可以执行完全匹配并且只提供 2 个文档?我正在使用以下方法:
QueryBuilders.queryString(<OR string>).field("dimensions.region")
我尝试了 QueryBuilders.termsQuery
、QueryBuilders.inQuery
和 QueryBuilders.matchQuery
(带数组),但没有成功。
有人可以帮忙吗?提前致谢。
您可以在这里做几件事。
首先,我设置了一个没有任何显式映射或分析的索引,这意味着将使用 standard analyzer。这很重要,因为它决定了我们如何查询文本字段。
所以我开始:
DELETE /test_index
PUT /test_index
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
}
}
PUT /test_index/doc/1
{
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
PUT /test_index/doc/2
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
PUT /test_index/doc/3
{
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
然后我尝试了您的查询,但没有任何结果。我不明白为什么你的 fields
参数中有 "dimensions.ga:region"
,但是当我将其更改为 "dimensions.region"
时,我得到了一些结果:
POST /test_index/doc/_search
{
"query": {
"query_string": {
"query": "Masovian Voivodeship OR Federal District",
"fields": [
"dimensions.region"
]
}
}
}
...
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 0.46911472,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.46911472,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.3533006,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 0.05937162,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 2,
"dimensions": {
"region": "Coimbra District"
}
}
}
]
}
}
然而,这returns是你不想要的结果。一种解决方法如下:
POST /test_index/doc/_search
{
"query": {
"query_string": {
"query": "(Masovian AND Voivodeship) OR (Federal AND District)",
"fields": [
"dimensions.region"
]
}
}
}
...
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.46911472,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "3",
"_score": 0.46911472,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Masovian Voivodeship"
}
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 0.3533006,
"_source": {
"date": "2013-12-30T00:00:00.000Z",
"value": 1,
"dimensions": {
"region": "Federal District"
}
}
}
]
}
}
另一种方法(我更喜欢这个)给出相同的结果是使用 match query and boolean should:
的组合POST /test_index/doc/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"dimensions.region": {
"query": "Masovian Voivodeship",
"operator": "and"
}
}
},
{
"match": {
"dimensions.region": {
"query": "Federal District",
"operator": "and"
}
}
}
]
}
}
}
这是我使用的代码:
http://sense.qbox.io/gist/bb5062a635c4f9519a411fdd3c8540eae8bdfd51