Elasticsearch:具有过滤功能的查询是否会受到不在过滤器中的记录相关性的影响?
Elasticsearch: Does a query with filtering get affected by relevancy of records not in the filter?
假设我有三组数据(SetA、SetB、SetC)和三个客户。我的第一个客户可以访问 SetA 和 SetB,我的第二个客户可以访问 SetA 和 SetC,第三个客户使用 SetB 和 SetC。我可以为每个客户创建一个 elasticsearch 索引,这样每个索引都会有以下数据集...
索引 1 索引 2 索引 3
------ ------ ------
设置A 设置A 设置B
设置B 设置C 设置C
然后我根据客户查询正确的索引。这很简单,但确实涉及数据重复。
我可以用所有三组数据创建一个索引。
指数
------
设置A
设置B
设置C
然后我会在查询中添加过滤,以便结果只考虑来自正确集合的记录。这会起作用,但我担心这个单一索引解决方案不会为查询提供与多索引方法相同的结果。
我认为,如果有误,很高兴得到纠正,索引将在涉及相关性和频率等内部评分时考虑索引中的所有记录。因此,具有过滤功能的单一索引不会给出与多索引方法相同的结果。这个假设是否正确?
如果您首先根据您的客户 ID 过滤结果,然后仅进行搜索,那么不会对相关性产生影响,您可以而且应该在 Elasticsearch 中组合这些数据,而不是为此创建 3 个不同的索引。
您可以阅读有关 query and filter context and
的更多信息
让我通过一个小例子向您展示:
索引定义
{
"mappings": {
"properties": {
"setA": {
"type": "text"
},
"setB": {
"type": "text"
},
"setC": {
"type": "text"
},
"customer-id": {
"type": "long"
}
}
}
}
为每个客户索引两个示例文档
{
"setA" : "first customer",
"setB" : "first customer",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "first customer set A",
"setB" : "first customer set B",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "second customer",
"setC" : "second customer",
"customer-id" : 2
}
{
"setA" : "second customer set A",
"setC" : "second customer set C",
"customer-id" : 2
}
{
"setB" : "third customer",
"setC" : "third customer",
"customer-id" : 3
}
{
"setB" : "third customer set A",
"setC" : "third customer set C",
"customer-id" : 3
}
通过首先过滤第一个客户然后使用相关性得分进行搜索来搜索查询
{
"query": {
"bool": {
"must": [ --> this would match and order according to relevance score
{
"match": {
"setA": "first"
}
}
],
"filter": [ --> this is used for filtering all docs for cust-1
{
"term": {
"customer-id": 1
}
}
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915, --> relevance is high
"_source": {
"setA": "first customer",
"setB": "first customer",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
},
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "2",
"_score": 0.60996956, -> relavance is low as more words than first
"_source": {
"setA": "first customer set A",
"setB": "first customer set B",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
}
]
假设我有三组数据(SetA、SetB、SetC)和三个客户。我的第一个客户可以访问 SetA 和 SetB,我的第二个客户可以访问 SetA 和 SetC,第三个客户使用 SetB 和 SetC。我可以为每个客户创建一个 elasticsearch 索引,这样每个索引都会有以下数据集...
索引 1 索引 2 索引 3 ------ ------ ------ 设置A 设置A 设置B 设置B 设置C 设置C
然后我根据客户查询正确的索引。这很简单,但确实涉及数据重复。
我可以用所有三组数据创建一个索引。
指数 ------ 设置A 设置B 设置C
然后我会在查询中添加过滤,以便结果只考虑来自正确集合的记录。这会起作用,但我担心这个单一索引解决方案不会为查询提供与多索引方法相同的结果。
我认为,如果有误,很高兴得到纠正,索引将在涉及相关性和频率等内部评分时考虑索引中的所有记录。因此,具有过滤功能的单一索引不会给出与多索引方法相同的结果。这个假设是否正确?
如果您首先根据您的客户 ID 过滤结果,然后仅进行搜索,那么不会对相关性产生影响,您可以而且应该在 Elasticsearch 中组合这些数据,而不是为此创建 3 个不同的索引。
您可以阅读有关 query and filter context and
让我通过一个小例子向您展示:
索引定义
{
"mappings": {
"properties": {
"setA": {
"type": "text"
},
"setB": {
"type": "text"
},
"setC": {
"type": "text"
},
"customer-id": {
"type": "long"
}
}
}
}
为每个客户索引两个示例文档
{
"setA" : "first customer",
"setB" : "first customer",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "first customer set A",
"setB" : "first customer set B",
"setC" : "",
"customer-id" : 1
}
{
"setA" : "second customer",
"setC" : "second customer",
"customer-id" : 2
}
{
"setA" : "second customer set A",
"setC" : "second customer set C",
"customer-id" : 2
}
{
"setB" : "third customer",
"setC" : "third customer",
"customer-id" : 3
}
{
"setB" : "third customer set A",
"setC" : "third customer set C",
"customer-id" : 3
}
通过首先过滤第一个客户然后使用相关性得分进行搜索来搜索查询
{
"query": {
"bool": {
"must": [ --> this would match and order according to relevance score
{
"match": {
"setA": "first"
}
}
],
"filter": [ --> this is used for filtering all docs for cust-1
{
"term": {
"customer-id": 1
}
}
]
}
}
}
搜索结果
"hits": [
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "1",
"_score": 0.8025915, --> relevance is high
"_source": {
"setA": "first customer",
"setB": "first customer",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
},
{
"_index": "so_query_filter",
"_type": "_doc",
"_id": "2",
"_score": 0.60996956, -> relavance is low as more words than first
"_source": {
"setA": "first customer set A",
"setB": "first customer set B",
"setC": "",
"customer-id": 1 --> only cust-1 doc
}
}
]