在 Elasticsearch 中对 JSON 数据执行搜索
Performing searches on JSON data in Elasticsearch
我已经通过 Logstash 将 JSON 数据映射到 Elasticsearch 中,它已经工作,它已经导入了数据,我可以在 Elasticsearch-Head 中看到它。
我的问题是查询数据。我可以 运行 搜索一个字段,但它 returns 索引中的整个类型作为单个搜索结果。我尝试了一些变体,但没有任何运气。
这是 logstash 托运程序文件:
input {
exec {
type => "recom_db"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_db.json"
interval => 86400
codec => "json"
}
exec {
type => "recom_ki"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_ki.json"
interval => 86400
codec => "json"
}
exec {
type => "recom_un"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_un.json"
interval => 86400
codec => "json"
}
}
output {
elasticsearch {
host => localhost
index => "lib-recommender-%{+yyyy.MM.dd}"
template_name => "recommender-template"
}
}
而Elasticsearch索引的形式如下:
{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_version": 1,
"_score": 1,
"_source": {
"item": [{
"name": "AAM219 -- reading lists",
"link": "http://www.test.com/modules/aam219.html",
"description": "AAM219 -- reading lists",
"terms": {
"term": ["AAM219"]
}
},
{
"name": "AAR410 -- reading lists",
"link": "http://www.test.com/modules/aar410.html",
"description": "AAR410 -- reading lists",
"terms": {
"term": ["AAR410"]
}
}
...
无论如何,我已经尝试以我在 Elasticsearch 文档中看到的各种方式查询数据,但无法获得所需的结果。这是我尝试过的众多查询之一:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"fields": ["item.name", "item.link"],
"query":{
"term": {
"item.terms.term": "AAM219"
}
}
}
}'
但它 returns 索引中的整个类型(选择了正确的字段但脱节且所有字段):
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.006780553,
"hits": [{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_score": 0.006780553,
"fields": {
"item.link": ["http://www.test.com/modules/aam219.html",
"http://www.test.com/modules/aar410.html",
"http://www.test.com/modules/ac1201.html",
"http://www.test.com/modules/aca401.html",
我正在寻找以下结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.006780553,
"hits": [{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_score": 0.006780553,
"_source": {
"item": [{
"name": "AAM219 -- reading lists",
"link": "http://www.test.com/modules/aam219.html",
"description": "AAM219 -- reading lists",
"terms": {
"term": ["AAM219"]
}
}
}
}
}
}
我错过了什么?这种搜索的索引映射是否错误(所以我应该在导入数据之前手动为elasticsearch制作一个映射文件)。查询中是否缺少我的参数?我一直在寻找答案,但感觉我现在 运行 正在兜圈子,我猜这是我忽略但不确定的简单问题。
是的,要使这种用例起作用,您需要创建一个自定义映射并确保您的 item
结构是 nested
类型,否则 [=12= 中的所有字段] 将像您在显示的结果中看到的那样折叠在一起。
所以映射需要像这样:
{
"recom_un": {
"properties": {
"item": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"link": {
"type": "string"
},
"description": {
"type": "string"
},
"terms": {
"properties": {
"term": {
"type": "string"
}
}
}
}
}
}
}
}
然后您可以稍微修改您的查询以使用 nested
query instead like this. Also note I'm including the inner_hits
,以便您的结果仅包含匹配的嵌套文档:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"fields": [
"item.name",
"item.link"
],
"query": {
"nested": {
"path": "item",
"query": {
"term": {
"item.terms.term": "AAM219"
}
},
"inner_hits": {}
}
}
}'
支持 Val 上面的回答。它主要是什么,但有另一层嵌套。
这是映射:
{
"recom_un": {
"properties": {
"item": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"link": {
"type": "string"
},
"description": {
"type": "string"
},
"terms": {
"type": "nested",
"properties": {
"term": {
"type": "string"
}
}
}
}
}
}
}
}
我曾经得到我想要的搜索查询:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"_source": false,
"query": {
"filtered": {
"filter": {
"nested": {
"path": "item",
"query": {
"nested": {
"path": "item.terms",
"query": {
"match": {
"term": "AAM219"
}
}
}
},
"inner_hits": { }
}
}
}
}
}'
我已经通过 Logstash 将 JSON 数据映射到 Elasticsearch 中,它已经工作,它已经导入了数据,我可以在 Elasticsearch-Head 中看到它。
我的问题是查询数据。我可以 运行 搜索一个字段,但它 returns 索引中的整个类型作为单个搜索结果。我尝试了一些变体,但没有任何运气。
这是 logstash 托运程序文件:
input {
exec {
type => "recom_db"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_db.json"
interval => 86400
codec => "json"
}
exec {
type => "recom_ki"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_ki.json"
interval => 86400
codec => "json"
}
exec {
type => "recom_un"
command => "curl -s -X GET http://www.test.com/api/edselastic/recom_un.json"
interval => 86400
codec => "json"
}
}
output {
elasticsearch {
host => localhost
index => "lib-recommender-%{+yyyy.MM.dd}"
template_name => "recommender-template"
}
}
而Elasticsearch索引的形式如下:
{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_version": 1,
"_score": 1,
"_source": {
"item": [{
"name": "AAM219 -- reading lists",
"link": "http://www.test.com/modules/aam219.html",
"description": "AAM219 -- reading lists",
"terms": {
"term": ["AAM219"]
}
},
{
"name": "AAR410 -- reading lists",
"link": "http://www.test.com/modules/aar410.html",
"description": "AAR410 -- reading lists",
"terms": {
"term": ["AAR410"]
}
}
...
无论如何,我已经尝试以我在 Elasticsearch 文档中看到的各种方式查询数据,但无法获得所需的结果。这是我尝试过的众多查询之一:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"fields": ["item.name", "item.link"],
"query":{
"term": {
"item.terms.term": "AAM219"
}
}
}
}'
但它 returns 索引中的整个类型(选择了正确的字段但脱节且所有字段):
{
"took": 13,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.006780553,
"hits": [{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_score": 0.006780553,
"fields": {
"item.link": ["http://www.test.com/modules/aam219.html",
"http://www.test.com/modules/aar410.html",
"http://www.test.com/modules/ac1201.html",
"http://www.test.com/modules/aca401.html",
我正在寻找以下结果:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.006780553,
"hits": [{
"_index": "lib-recommender-2015.06.11",
"_type": "recom_un",
"_id": "qoZE4aF-SkS--tq_8MhH4A",
"_score": 0.006780553,
"_source": {
"item": [{
"name": "AAM219 -- reading lists",
"link": "http://www.test.com/modules/aam219.html",
"description": "AAM219 -- reading lists",
"terms": {
"term": ["AAM219"]
}
}
}
}
}
}
我错过了什么?这种搜索的索引映射是否错误(所以我应该在导入数据之前手动为elasticsearch制作一个映射文件)。查询中是否缺少我的参数?我一直在寻找答案,但感觉我现在 运行 正在兜圈子,我猜这是我忽略但不确定的简单问题。
是的,要使这种用例起作用,您需要创建一个自定义映射并确保您的 item
结构是 nested
类型,否则 [=12= 中的所有字段] 将像您在显示的结果中看到的那样折叠在一起。
所以映射需要像这样:
{
"recom_un": {
"properties": {
"item": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"link": {
"type": "string"
},
"description": {
"type": "string"
},
"terms": {
"properties": {
"term": {
"type": "string"
}
}
}
}
}
}
}
}
然后您可以稍微修改您的查询以使用 nested
query instead like this. Also note I'm including the inner_hits
,以便您的结果仅包含匹配的嵌套文档:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"fields": [
"item.name",
"item.link"
],
"query": {
"nested": {
"path": "item",
"query": {
"term": {
"item.terms.term": "AAM219"
}
},
"inner_hits": {}
}
}
}'
支持 Val 上面的回答。它主要是什么,但有另一层嵌套。 这是映射:
{
"recom_un": {
"properties": {
"item": {
"type": "nested",
"properties": {
"name": {
"type": "string"
},
"link": {
"type": "string"
},
"description": {
"type": "string"
},
"terms": {
"type": "nested",
"properties": {
"term": {
"type": "string"
}
}
}
}
}
}
}
}
我曾经得到我想要的搜索查询:
curl -XPOST "http://localhost:9200/lib-recommender/recom_un/_search" -d'
{
"_source": false,
"query": {
"filtered": {
"filter": {
"nested": {
"path": "item",
"query": {
"nested": {
"path": "item.terms",
"query": {
"match": {
"term": "AAM219"
}
}
}
},
"inner_hits": { }
}
}
}
}
}'