弹性搜索数据模型

Elasticsearch data model

我目前正在解析公司内部简历中的文本。目标是索引 elasticsearch 中的所有内容以对其执行搜索。

目前我有以下 JSON 未定义映射的文档:

每个同事都有一个项目的列表,其中客户名称

{
name: "Jean Wisser"
position: "Junior Developer"
"projects": [
        {
            "client": "SutrixMedia",
            "missions": [
                "Responsible for the quality on time and within budget",
                "Writing specs, testing,..."
            ],
            "technologies": "JIRA/Mantis/Adobe CQ5 (AEM)"
        },
        {
            "client": "Société Générale",
            "missions": [
                " Writing test cases and scenarios",
                " UAT"
             ],
            "technologies": "HP QTP/QC"
        }
    ]
}

我们想回答的 2 个主要问题是:

  1. 哪位同事已经在这家公司工作过?
  2. 哪个客户端使用了这个技术?

第一个问题真的很容易回答,例如: Projects.client="SutrixMedia" returns 我是正确的简历。

但是我该如何回答第二个问题呢?

我想进行这样的查询:Projects.technologies="HP QTP/QC" 答案将只是客户名称(在本例中为 "Société Générale"),而不是整个文档。

是否可以通过定义嵌套类型的映射来得到这个答案? 或者我应该去 parent/child 映射?

是的,的确,ES 1.5 是可能的。* 如果您将 projects 映射为 nested 类型,然后检索嵌套的 inner_hits.

以上示例文档的映射如下:

curl -XPUT localhost:9200/resumes -d '
{
  "mappings": {
    "resume": {
      "properties": {
        "name": {
          "type": "string"
        },
        "position": {
          "type": "string"
        },
        "projects": {
          "type": "nested",        <--- declare "projects" as nested type
          "properties": {
            "client": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            },
            "missions": {
              "type": "string"
            },
            "technologies": {
              "type": "string",
              "fields": {
                "raw": {
                  "type": "string",
                  "index": "not_analyzed"
                }
              }
            }
          }
        }
      }
    }
  }
}'

然后,您可以从上面索引您的示例文档:

curl -XPUT localhost:9200/resumes/resume/1 -d '{...}'

最后,使用以下仅检索 nested inner_hits 的查询,您只能检索与 Projects.technologies="HP QTP/QC"

匹配的嵌套对象
curl -XPOST localhost:9200/resumes/resume/_search -d '
{
  "_source": false,
  "query": {
    "nested": {
      "path": "projects",
      "query": {
        "term": {
          "projects.technologies.raw": "HP QTP/QC"
        }
      },
      "inner_hits": {           <----- only retrieve the matching nested document
        "_source": "client"     <----- and only the "client" field 
      }
    }
  }
}'

只生成客户名称而不是整个匹配文档:

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.4054651,
    "hits" : [ {
      "_index" : "resumes",
      "_type" : "resume",
      "_id" : "1",
      "_score" : 1.4054651,
      "inner_hits" : {
        "projects" : {
          "hits" : {
            "total" : 1,
            "max_score" : 1.4054651,
            "hits" : [ {
              "_index" : "resumes",
              "_type" : "resume",
              "_id" : "1",
              "_nested" : {
                "field" : "projects",
                "offset" : 1
              },
              "_score" : 1.4054651,
              "_source":{"client":"Société Générale"}  <--- here is the client name
            } ]
          }
        }
      }
    } ]
  }
}