Elasticsearch - 查找至少有 x 个元素与给定数组相同的记录
Elasticsearch - Find records that have at least x elements in common with a given array
我有这样的映射:
"properties": {
"id": {"type": "long", "index": "not_analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
"skills": {"type": "string", "index": "not_analyzed"}
}
我想使用给定的映射将学生的个人资料存储在 elasticsearch 中。 skills
是他们在个人资料中指定的计算机技能列表(python、javascript、...)。
给定像 ['html', 'css', 'sass', 'javascript', 'django', 'bootstrap', 'angularjs', 'backbone']
这样的技能集,我想找到所有具有该技能集中至少 3 项技能的个人资料。我对了解他们与我们想要的列表有哪些共同技能不感兴趣,只对计数感兴趣。有没有办法在 elasticsearch 中做到这一点?
可能有更好的方法我没有想到,但你可以用 script filter。
我建立了一个简化版本的索引,还有一些文档:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"skills": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"skills":["html","css","javascript"]}
{"index":{"_id":2}}
{"skills":["bootstrap", "angularjs", "backbone"]}
{"index":{"_id":3}}
{"skills":["python", "javascript", "ruby","java"]}
然后 运行 这个查询:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "count=0; for(s: doc['skills'].values){ for(x: skills){ if(s == x){ count +=1 } } } count >= 3",
"params": {
"skills": ["html", "css", "sass", "javascript", "django", "bootstrap", "angularjs", "backbone"]
}
}
}
}
}
}
并得到了我预期的结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"skills": [
"html",
"css",
"javascript"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"skills": [
"bootstrap",
"angularjs",
"backbone"
]
}
}
]
}
}
这里是所有的代码:
http://sense.qbox.io/gist/1018a01f1df29cb793ea15661f22bc8b25ed3476
可以使用 query string 和 minimum_should_match 选项
示例:
POST <index>/_search
{
"query": {
"filtered": {
"filter": {
"query": {
"query_string": {
"default_field": "skills",
"query": "html css sass javascript django bootstrap angularjs backbone \"ruby on rails\" ",
"minimum_should_match" : "3"
}
}
}
}
}
}
我有这样的映射:
"properties": {
"id": {"type": "long", "index": "not_analyzed"},
"name": {"type": "string", "index": "not_analyzed"},
"skills": {"type": "string", "index": "not_analyzed"}
}
我想使用给定的映射将学生的个人资料存储在 elasticsearch 中。 skills
是他们在个人资料中指定的计算机技能列表(python、javascript、...)。
给定像 ['html', 'css', 'sass', 'javascript', 'django', 'bootstrap', 'angularjs', 'backbone']
这样的技能集,我想找到所有具有该技能集中至少 3 项技能的个人资料。我对了解他们与我们想要的列表有哪些共同技能不感兴趣,只对计数感兴趣。有没有办法在 elasticsearch 中做到这一点?
可能有更好的方法我没有想到,但你可以用 script filter。
我建立了一个简化版本的索引,还有一些文档:
PUT /test_index
{
"settings": {
"number_of_shards": 1
},
"mappings": {
"doc": {
"properties": {
"skills": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
POST /test_index/doc/_bulk
{"index":{"_id":1}}
{"skills":["html","css","javascript"]}
{"index":{"_id":2}}
{"skills":["bootstrap", "angularjs", "backbone"]}
{"index":{"_id":3}}
{"skills":["python", "javascript", "ruby","java"]}
然后 运行 这个查询:
POST /test_index/_search
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"script": {
"script": "count=0; for(s: doc['skills'].values){ for(x: skills){ if(s == x){ count +=1 } } } count >= 3",
"params": {
"skills": ["html", "css", "sass", "javascript", "django", "bootstrap", "angularjs", "backbone"]
}
}
}
}
}
}
并得到了我预期的结果:
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "test_index",
"_type": "doc",
"_id": "1",
"_score": 1,
"_source": {
"skills": [
"html",
"css",
"javascript"
]
}
},
{
"_index": "test_index",
"_type": "doc",
"_id": "2",
"_score": 1,
"_source": {
"skills": [
"bootstrap",
"angularjs",
"backbone"
]
}
}
]
}
}
这里是所有的代码:
http://sense.qbox.io/gist/1018a01f1df29cb793ea15661f22bc8b25ed3476
可以使用 query string 和 minimum_should_match 选项
示例:
POST <index>/_search
{
"query": {
"filtered": {
"filter": {
"query": {
"query_string": {
"default_field": "skills",
"query": "html css sass javascript django bootstrap angularjs backbone \"ruby on rails\" ",
"minimum_should_match" : "3"
}
}
}
}
}
}