根据数组 ElasticSearch 中的项目过滤文档

Question

我正在使用 ElasticSearch 搜索文档。但是，我需要确保当前用户能够看到这些文档。每个文档都与一个社区相关联，用户可能属于该社区。

这是我的文档的映射：

export const mapping = {
  properties: {
    amazonId: { type: 'text' },
    title: { type: 'text' },
    subtitle: { type: 'text' },
    description: { type: 'text' },
    createdAt: { type: 'date' },
    updatedAt: { type: 'date' },
    published: { type: 'boolean' },
    communities: { type: 'nested' }
  }
}

我目前正在将文档所属社区的 ID 保存在一个字符串数组中。例如：["edd05cd0-0a49-4676-86f4-2db913235371", "672916cf-ee32-4bed-a60f-9a7c08dba04b"]

目前，当我使用 {term: { communities: community.id } } 过滤查询时，它 return 包含所有文档，而不管它绑定到哪个社区。

这是完整的查询：

{
  index: 'document',
  filter_path: { filter: {term: { communities: community.id } } },
  body: {
    sort: [{ createdAt: { order: 'asc' } }]
  }
}

这是根据 "b7d28e7f-7534-406a-981e-ddf147b5015a" 的社区 ID 得出的以下结果。 注意： 这是来自我的 graphql 的 return，因此文档中的社区是解析 ES 查询命中后的实际完整对象。

"hits": [
    {
      "title": "The One True Document",
      "communities": [
        {
          "id": "edd05cd0-0a49-4676-86f4-2db913235371"
        },
        {
          "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
        }
      ]
    },
    {
      "title": "Boring Document 1",
      "communities": []
    },
    {
      "title": "Boring Document 2",
      "communities": []
    },
    {
      "title": "Unpublished",
      "communities": [
        {
          "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
        }
       ]
    }
]

当我尝试将社区映射为 {type: 'keyword', index: 'not_analyzed'} 时，我收到一条错误消息，指出 [illegal_argument_exception] Could not convert [communities.index] to boolean。

那么我是否需要更改我的映射、过滤器或两者？搜索 docs for 6.6，我发现 terms 需要 non_analyzed 映射。

更新-------------------------

我将社区映射更新为 keyword，如下所示。但是，我仍然收到相同的结果。

我将查询更新为以下内容（使用包含文档的社区 ID）：

query: { index: 'document',
  body: 
   { sort: [ { createdAt: { order: 'asc' } } ],
     from: 0,
     size: 5,
     query: 
      { bool: 
         { filter: 
            { term: { communities: '672916cf-ee32-4bed-a60f-9a7c08dba04b' } } } } } }

这给了我以下结果：

{
  "data": {
    "communities": [
      {
        "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b",
        "feed": {
          "documents": {
            "hits": []
          }
        }
      }
    ]
  }
}

似乎我的过滤器工作得太好了？

Answer 1

由于您要存储社区的 ID，因此应确保 ID 不会被分析。为此 communities 应该是 keyword 类型。其次，您要存储社区 ID 数组，因为用户可以属于多个社区。为此，您不需要将其设为 nested 类型。 Nested 有所有不同的用例。要将值存储为数组，您需要确保在索引时始终将值作为数组传递给字段，即使该值是单个值也是如此。

您需要更改映射以及针对字段 communities 对值编制索引的方式。

1。更新映射如下：

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "amazonId": {
          "type": "text"
        },
        "title": {
          "type": "text"
        },
        "subtitle": {
          "type": "text"
        },
        "description": {
          "type": "text"
        },
        "createdAt": {
          "type": "date"
        },
        "updatedAt": {
          "type": "date"
        },
        "published": {
          "type": "boolean"
        },
        "communities": {
          "type": "keyword"
        }
      }
    }
  }
}

2。将文档添加到索引：

PUT my_index/_doc/1
{
  "title": "The One True Document",
  "communities": [
    "edd05cd0-0a49-4676-86f4-2db913235371",
    "672916cf-ee32-4bed-a60f-9a7c08dba04b"
  ]
}

3。按社区 ID 过滤：

GET my_index/_doc/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "term": {
            "communities": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
          }
        }
      ]
    }
  }
}

嵌套字段方法

1。映射：

PUT my_index_2
{
  "mappings": {
    "_doc": {
      "properties": {
        "amazonId": {
          "type": "text"
        },
        "title": {
          "type": "text"
        },
        "subtitle": {
          "type": "text"
        },
        "description": {
          "type": "text"
        },
        "createdAt": {
          "type": "date"
        },
        "updatedAt": {
          "type": "date"
        },
        "published": {
          "type": "boolean"
        },
        "communities": {
          "type": "nested"
        }
      }
    }
  }
}

2。索引文档：

PUT my_index_2/_doc/1
{
  "title": "The One True Document",
  "communities": [
    {
      "id": "edd05cd0-0a49-4676-86f4-2db913235371"
    },
    {
      "id": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
    }
  ]
}

3。查询（用于嵌套查询）：

GET my_index_2/_doc/_search
{
  "query": {
    "bool": {
      "filter": [
        {
          "nested": {
            "path": "communities",
            "query": {
              "term": {
                "communities.id.keyword": "672916cf-ee32-4bed-a60f-9a7c08dba04b"
              }
            }
          }
        }
      ]
    }
  }
}

您可能注意到我使用了 communities.id.keyword 而不是 communities.id。要了解这种情况的原因，请阅读。

根据数组 ElasticSearch 中的项目过滤文档

Filter document on items in an array ElasticSearch

elasticsearch

elasticsearch-6

嵌套字段方法