Django、haystack、弹性搜索和一对多关系
Django, haystack, elastic search and one to many relation
我有 haystack 的问题 - 我不知道如何搜索 所有外键都满足给定条件的模型 A。
我的简化模型如下所示:
Group:
id
Meeting:
group = models.ForeignKey(Group)
day_of_week = models.IntegerField()
hour = models.IntegerField()
length = models.IntegerField()
基本上,一个群组可以有很多会议,用户应该能够搜索所有会议都在给定时间范围内的群组。例如:
Group(1)
Meeting(day_of_week=Monday, hour=9, length=2)
Group(2)
Meeting(day_of_week=Monday, hour=10, length=1)
Meeting(day_of_week=Tuesday, hour=8, length=2)
Group(3)
Meeting(day_of_week=Monday, hour=10, length=1)
Meeting(day_of_week=Wednesday, hour=12, length=1)
并搜索:"Monday from 8 to 11"、"Tuesday, from 12 to 14 (2p.m.)"、"Wednesday, from 6 to 17 (5p.m.)" 应该 return 组 1 和 3,因为来自这些组的所有会议都包含在用户指定的范围和组 2 中未 returned,因为第二次会议不在给定范围内(但第一次会议在)。
如果我要写一个 SQL,我可能会选择类似“select 匹配会议的计数和所有会议的计数,如果这些数字相等 -> 然后所有会议都开会:
SELECT g.id,
count(m2.id)
FROM groups g
JOIN meetings m2 ON m2.group_id = g.id
AND ((m2.day_of_week = 0 -- monday
AND m2.hour >= 8
AND m2.length<=3)
OR (m2.day_of_week=1 -- tuesday
AND m2.hour >= 12
AND m2.length<=2)
OR (m2.day_of_week=2 -- wednesday
AND m2.hour >= 6
AND m2.length<=11))
GROUP BY g.id
HAVING count(m2.id) =
(SELECT count(*)
FROM meetings
WHERE meetings.group_id=g.id);
但我们正在使用 haystack + elastic search 进行索引,我完全不知道如何将模型展平以进行索引和编写查询。有人可以帮我吗?
您可能需要将文档扁平化,使所有文档都必须是包含组信息的会议。
** ES 5 的解决方案**
您文档的映射为:
PUT /meetings
{
"mappings": {
"meeting": {
"properties": {
"groupId": {
"type": "integer"
},
"dayOfWeek": {
"type": "integer"
},
"hourRange": {
"type": "integer_range"
}
}
}
}
}
那么您的五个文档将如下所示:
POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "hourRange": {"gte": 9, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "hourRange": {"gte": 8, "lte": 10}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "hourRange": {"gte": 12, "lte": 13}}
最后,查询将如下所示:
POST /meetings/meeting/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 0
}
},
{
"range": {
"hourRange": {
"gte": "8",
"lte": "11",
"relation": "within"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 1
}
},
{
"range": {
"hourRange": {
"gte": "12",
"lte": "14",
"relation": "within"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 2
}
},
{
"range": {
"hourRange": {
"gte": "6",
"lte": "17",
"relation": "within"
}
}
}
]
}
}
]
}
}
}
** ES <5 的解决方案 **
PUT /meetings
{
"mappings": {
"meeting": {
"properties": {
"groupId": {
"type": "integer"
},
"dayOfWeek": {
"type": "integer"
},
"start": {
"type": "integer"
},
"end": {
"type": "integer"
}
}
}
}
}
那么您的五个文档将如下所示:
POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "start": 9, "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "start": 8, "end": 10}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "start": 12, "end": 13}
最后,查询将如下所示:
POST /meetings/meeting/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 0
}
},
{
"range": {
"start": {
"gte": "8"
}
}
},
{
"range": {
"end": {
"lte": "11"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 1
}
},
{
"range": {
"start": {
"gte": "12"
}
}
},
{
"range": {
"end": {
"lte": "14"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 2
}
},
{
"range": {
"start": {
"gte": "6"
}
}
},
{
"range": {
"end": {
"lte": "17"
}
}
}
]
}
}
]
}
}
}
ElasticSearch 解决方案
解决方案的关键是名为 nested objects 的 ElasticSearch 功能。幸运的是,此功能存在于所有 ES 版本中。嵌套 object 是这里的关键,因为会议中的数据是严格关联的。
PUT /myindex
{
"mappings": {
"groups": {
"properties": {
"meetings": {
"type": "nested",
"properties": {
"dayOfWeek": { "type": "integer"},
"start": {"type": "integer"},
"end": {"type": "integer"}
}
},
"groupId": {"type":"integer"}
}
}
}
}
POST /myindex/groups/_bulk
{"index": {}}
{"groupId": 1, "meetings": [{"dayOfWeek": 0, "start": 9, "end": 11}]}
{"index": {}}
{"groupId": 2, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, { "dayOfWeek": 1, "start": 8, "end": 10}]}
{"index": {}}
{"groupId": 3, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, {"dayOfWeek": 2, "start": 12, "end": 13}]}
此时可以清楚的看到会议是分组的,我们将分组搜索。
不可能直接写查询得到所有嵌套object满足条件的所有组,但是...可以很容易地反转为:得到all 会议的 none 组包含 错误的 时间。
GET /myindex/_search
{
"query": {
"bool": {
"must_not" : {
"nested": {
"path": "meetings",
"filter": {
"bool": {
"must_not": {
"bool": {
"should": [
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 0 }},
{"range": {"start": {"from":8, "to":11}}},
{"range": {"end": {"from":8, "to":11}}}
]
}
},
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 1 }},
{"range": {"start": {"from":12, "to":14}}},
{"range": {"end": {"from":12, "to":14}}}
]
}
},
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 2 }},
{"range": {"start": {"from":6, "to":17}}},
{"range": {"end": {"from":6, "to":17}}}
]
}
}
]
}
}
}
}
}
}
}
}
}
这将 return 第 1 组和第 3 组。第 2 组将不会被 return 编辑,因为其中一个会议与错误的日期时间重叠。
Haystack 集成
第二个问题是与 Django Haystack 的集成,因为默认情况下它不支持 engine-specific ES 中的嵌套字段等功能。幸运的是,我不是唯一在 django 应用程序和 someone has already resolved it.
中需要它的人
我有 haystack 的问题 - 我不知道如何搜索 所有外键都满足给定条件的模型 A。
我的简化模型如下所示:
Group:
id
Meeting:
group = models.ForeignKey(Group)
day_of_week = models.IntegerField()
hour = models.IntegerField()
length = models.IntegerField()
基本上,一个群组可以有很多会议,用户应该能够搜索所有会议都在给定时间范围内的群组。例如:
Group(1)
Meeting(day_of_week=Monday, hour=9, length=2)
Group(2)
Meeting(day_of_week=Monday, hour=10, length=1)
Meeting(day_of_week=Tuesday, hour=8, length=2)
Group(3)
Meeting(day_of_week=Monday, hour=10, length=1)
Meeting(day_of_week=Wednesday, hour=12, length=1)
并搜索:"Monday from 8 to 11"、"Tuesday, from 12 to 14 (2p.m.)"、"Wednesday, from 6 to 17 (5p.m.)" 应该 return 组 1 和 3,因为来自这些组的所有会议都包含在用户指定的范围和组 2 中未 returned,因为第二次会议不在给定范围内(但第一次会议在)。
如果我要写一个 SQL,我可能会选择类似“select 匹配会议的计数和所有会议的计数,如果这些数字相等 -> 然后所有会议都开会:
SELECT g.id,
count(m2.id)
FROM groups g
JOIN meetings m2 ON m2.group_id = g.id
AND ((m2.day_of_week = 0 -- monday
AND m2.hour >= 8
AND m2.length<=3)
OR (m2.day_of_week=1 -- tuesday
AND m2.hour >= 12
AND m2.length<=2)
OR (m2.day_of_week=2 -- wednesday
AND m2.hour >= 6
AND m2.length<=11))
GROUP BY g.id
HAVING count(m2.id) =
(SELECT count(*)
FROM meetings
WHERE meetings.group_id=g.id);
但我们正在使用 haystack + elastic search 进行索引,我完全不知道如何将模型展平以进行索引和编写查询。有人可以帮我吗?
您可能需要将文档扁平化,使所有文档都必须是包含组信息的会议。
** ES 5 的解决方案**
您文档的映射为:
PUT /meetings
{
"mappings": {
"meeting": {
"properties": {
"groupId": {
"type": "integer"
},
"dayOfWeek": {
"type": "integer"
},
"hourRange": {
"type": "integer_range"
}
}
}
}
}
那么您的五个文档将如下所示:
POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "hourRange": {"gte": 9, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "hourRange": {"gte": 8, "lte": 10}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "hourRange": {"gte": 10, "lte": 11}}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "hourRange": {"gte": 12, "lte": 13}}
最后,查询将如下所示:
POST /meetings/meeting/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 0
}
},
{
"range": {
"hourRange": {
"gte": "8",
"lte": "11",
"relation": "within"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 1
}
},
{
"range": {
"hourRange": {
"gte": "12",
"lte": "14",
"relation": "within"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 2
}
},
{
"range": {
"hourRange": {
"gte": "6",
"lte": "17",
"relation": "within"
}
}
}
]
}
}
]
}
}
}
** ES <5 的解决方案 **
PUT /meetings
{
"mappings": {
"meeting": {
"properties": {
"groupId": {
"type": "integer"
},
"dayOfWeek": {
"type": "integer"
},
"start": {
"type": "integer"
},
"end": {
"type": "integer"
}
}
}
}
}
那么您的五个文档将如下所示:
POST /meetings/meeting/_bulk
{"index": {}}
{"groupId": 1, "dayOfWeek": 0, "start": 9, "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 2, "dayOfWeek": 1, "start": 8, "end": 10}
{"index": {}}
{"groupId": 3, "dayOfWeek": 0, "start": 10, "end": 11}
{"index": {}}
{"groupId": 3, "dayOfWeek": 2, "start": 12, "end": 13}
最后,查询将如下所示:
POST /meetings/meeting/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 0
}
},
{
"range": {
"start": {
"gte": "8"
}
}
},
{
"range": {
"end": {
"lte": "11"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 1
}
},
{
"range": {
"start": {
"gte": "12"
}
}
},
{
"range": {
"end": {
"lte": "14"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"dayOfWeek": 2
}
},
{
"range": {
"start": {
"gte": "6"
}
}
},
{
"range": {
"end": {
"lte": "17"
}
}
}
]
}
}
]
}
}
}
ElasticSearch 解决方案
解决方案的关键是名为 nested objects 的 ElasticSearch 功能。幸运的是,此功能存在于所有 ES 版本中。嵌套 object 是这里的关键,因为会议中的数据是严格关联的。
PUT /myindex
{
"mappings": {
"groups": {
"properties": {
"meetings": {
"type": "nested",
"properties": {
"dayOfWeek": { "type": "integer"},
"start": {"type": "integer"},
"end": {"type": "integer"}
}
},
"groupId": {"type":"integer"}
}
}
}
}
POST /myindex/groups/_bulk
{"index": {}}
{"groupId": 1, "meetings": [{"dayOfWeek": 0, "start": 9, "end": 11}]}
{"index": {}}
{"groupId": 2, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, { "dayOfWeek": 1, "start": 8, "end": 10}]}
{"index": {}}
{"groupId": 3, "meetings": [{"dayOfWeek": 0, "start": 10, "end": 11}, {"dayOfWeek": 2, "start": 12, "end": 13}]}
此时可以清楚的看到会议是分组的,我们将分组搜索。
不可能直接写查询得到所有嵌套object满足条件的所有组,但是...可以很容易地反转为:得到all 会议的 none 组包含 错误的 时间。
GET /myindex/_search
{
"query": {
"bool": {
"must_not" : {
"nested": {
"path": "meetings",
"filter": {
"bool": {
"must_not": {
"bool": {
"should": [
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 0 }},
{"range": {"start": {"from":8, "to":11}}},
{"range": {"end": {"from":8, "to":11}}}
]
}
},
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 1 }},
{"range": {"start": {"from":12, "to":14}}},
{"range": {"end": {"from":12, "to":14}}}
]
}
},
{
"bool": {
"must": [
{"term" : { "dayOfWeek" : 2 }},
{"range": {"start": {"from":6, "to":17}}},
{"range": {"end": {"from":6, "to":17}}}
]
}
}
]
}
}
}
}
}
}
}
}
}
这将 return 第 1 组和第 3 组。第 2 组将不会被 return 编辑,因为其中一个会议与错误的日期时间重叠。
Haystack 集成
第二个问题是与 Django Haystack 的集成,因为默认情况下它不支持 engine-specific ES 中的嵌套字段等功能。幸运的是,我不是唯一在 django 应用程序和 someone has already resolved it.
中需要它的人