弹性搜索数据模型
Elasticsearch data model
我目前正在解析公司内部简历中的文本。目标是索引 elasticsearch 中的所有内容以对其执行搜索。
目前我有以下 JSON 未定义映射的文档:
每个同事都有一个项目的列表,其中客户名称
{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
{
"client": "SutrixMedia",
"missions": [
"Responsible for the quality on time and within budget",
"Writing specs, testing,..."
],
"technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
},
{
"client": "Société Générale",
"missions": [
" Writing test cases and scenarios",
" UAT"
],
"technologies": "HP QTP/QC"
}
]
}
我们想回答的 2 个主要问题是:
- 哪位同事已经在这家公司工作过?
- 哪个客户端使用了这个技术?
第一个问题真的很容易回答,例如:
Projects.client="SutrixMedia
" returns 我是正确的简历。
但是我该如何回答第二个问题呢?
我想进行这样的查询:Projects.technologies="HP QTP/QC"
答案将只是客户名称(在本例中为 "Société Générale"),而不是整个文档。
是否可以通过定义嵌套类型的映射来得到这个答案?
或者我应该去 parent/child 映射?
是的,的确,ES 1.5 是可能的。* 如果您将 projects
映射为 nested
类型,然后检索嵌套的 inner_hits
.
以上示例文档的映射如下:
curl -XPUT localhost:9200/resumes -d '
{
"mappings": {
"resume": {
"properties": {
"name": {
"type": "string"
},
"position": {
"type": "string"
},
"projects": {
"type": "nested", <--- declare "projects" as nested type
"properties": {
"client": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"missions": {
"type": "string"
},
"technologies": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
然后,您可以从上面索引您的示例文档:
curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'
最后,使用以下仅检索 nested inner_hits
的查询,您只能检索与 Projects.technologies="HP QTP/QC"
匹配的嵌套对象
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
"_source": false,
"query": {
"nested": {
"path": "projects",
"query": {
"term": {
"projects.technologies.raw": "HP QTP/QC"
}
},
"inner_hits": { <----- only retrieve the matching nested document
"_source": "client" <----- and only the "client" field
}
}
}
}'
只生成客户名称而不是整个匹配文档:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_score" : 1.4054651,
"inner_hits" : {
"projects" : {
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_nested" : {
"field" : "projects",
"offset" : 1
},
"_score" : 1.4054651,
"_source":{"client":"Société Générale"} <--- here is the client name
} ]
}
}
}
} ]
}
}
我目前正在解析公司内部简历中的文本。目标是索引 elasticsearch 中的所有内容以对其执行搜索。
目前我有以下 JSON 未定义映射的文档:
每个同事都有一个项目的列表,其中客户名称
{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
{
"client": "SutrixMedia",
"missions": [
"Responsible for the quality on time and within budget",
"Writing specs, testing,..."
],
"technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
},
{
"client": "Société Générale",
"missions": [
" Writing test cases and scenarios",
" UAT"
],
"technologies": "HP QTP/QC"
}
]
}
我们想回答的 2 个主要问题是:
- 哪位同事已经在这家公司工作过?
- 哪个客户端使用了这个技术?
第一个问题真的很容易回答,例如:
Projects.client="SutrixMedia
" returns 我是正确的简历。
但是我该如何回答第二个问题呢?
我想进行这样的查询:Projects.technologies="HP QTP/QC"
答案将只是客户名称(在本例中为 "Société Générale"),而不是整个文档。
是否可以通过定义嵌套类型的映射来得到这个答案? 或者我应该去 parent/child 映射?
是的,的确,ES 1.5 是可能的。* 如果您将 projects
映射为 nested
类型,然后检索嵌套的 inner_hits
.
以上示例文档的映射如下:
curl -XPUT localhost:9200/resumes -d '
{
"mappings": {
"resume": {
"properties": {
"name": {
"type": "string"
},
"position": {
"type": "string"
},
"projects": {
"type": "nested", <--- declare "projects" as nested type
"properties": {
"client": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
},
"missions": {
"type": "string"
},
"technologies": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
然后,您可以从上面索引您的示例文档:
curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'
最后,使用以下仅检索 nested inner_hits
的查询,您只能检索与 Projects.technologies="HP QTP/QC"
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
"_source": false,
"query": {
"nested": {
"path": "projects",
"query": {
"term": {
"projects.technologies.raw": "HP QTP/QC"
}
},
"inner_hits": { <----- only retrieve the matching nested document
"_source": "client" <----- and only the "client" field
}
}
}
}'
只生成客户名称而不是整个匹配文档:
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_score" : 1.4054651,
"inner_hits" : {
"projects" : {
"hits" : {
"total" : 1,
"max_score" : 1.4054651,
"hits" : [ {
"_index" : "resumes",
"_type" : "resume",
"_id" : "1",
"_nested" : {
"field" : "projects",
"offset" : 1
},
"_score" : 1.4054651,
"_source":{"client":"Société Générale"} <--- here is the client name
} ]
}
}
}
} ]
}
}