使用 Elasticsearch 处理器将新文档添加到单独的索引

Adding a new document to a separate index using Elasticsearch processors

有没有办法在我索引某些文档时填充单独的索引?

假设我有类似的东西:

PUT person/_doc/1
{
  "name": "Jonh Doe",
  "languages": ["english", "spanish"]
}

PUT person/_doc/2
{
  "name": "Jane Doe",
  "languages": ["english", "russian"]
}

我想要的是每增加一个人,就在语言索引中增加一种语言。

类似于:

GET languages/_search

会给出:

...
"hits" : [
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "russian",
    "_score" : 1.0,
    "_source" : {
      "value" : "russian"
    }
  },
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "english",
    "_score" : 1.0,
    "_source" : {
      "value" : "english"
    }
  },
  {
    "_index" : "languages",
    "_type" : "doc",
    "_id" : "spanish",
    "_score" : 1.0,
    "_source" : {
      "value" : "spanish"
    }
  }
...

考虑管道,但我没有看到任何处理器允许这样的事情。

也许答案是创建自定义处理器。我已经有了一个,但不确定如何在单独的索引中插入文档。


更新:按照@Val answer works 中的描述使用transforms,似乎确实是正确的答案...

但是,我使用的是 Open Distro for Elasticsearch,那里不提供转换。一些在那里工作的替代解决方案将不胜感激:)


更新 2:看起来 OpenSearch 正在替换 Open Distro for Elasticsearch。还有一个 transform api \o/

您只需更改摄取管道中的 _index 字段名称:

{
  "description" : "sets the value of count to 1",
  "set": {
            "if": "[*your condition here*]",
            "field": "_index",
            "value": "languages",
            "override": true
        }
}

无法像在 Logstash 中那样克隆或拆分进入摄取管道的每个文档。因此,您不能从一个文档中索引两个文档。

但是,在为您的个人文档编制索引后,绝对可以点击 _transform API endpoint 并从 person 一个创建 languages 索引:

首先创建转换:

PUT _transform/languages-transform
{
  "source": {
    "index": "person"
  },
  "pivot": {
    "group_by": {
      "language": {
        "terms": {
          "field": "languages.keyword"
        }
      }
    },
    "aggregations": {
      "count": {
        "value_count": {
          "field": "languages.keyword"
        }
      }
    }
  },
  "dest": {
    "index": "languages",
    "pipeline": "set-id"
  }
}

您还需要创建将为您的语言文档设置正确 ID 的管道:

PUT _ingest/pipeline/set-id
{
  "processors": [
    {
      "set": {
        "field": "_id",
        "value": "{{language}}"
      }
    }
  ]
}

然后,你就可以开始改造了:

POST _transform/languages-transform/_start

完成后,您将拥有一个名为 languages 的新索引,其内容为

GET languages/_search
=>
"hits" : [
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "english",
    "_score" : 1.0,
    "_source" : {
      "count" : 4,
      "language" : "english"
    }
  },
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "russian",
    "_score" : 1.0,
    "_source" : {
      "count" : 2,
      "language" : "russian"
    }
  },
  {
    "_index" : "languages",
    "_type" : "_doc",
    "_id" : "spanish",
    "_score" : 1.0,
    "_source" : {
      "count" : 2,
      "language" : "spanish"
    }
  }
]

请注意,您还可以按计划设置该转换,以便它 运行 定期进行,或者您可以 运行 在适合您的时候手动进行,以重建语言索引。


OpenSearch 有自己的 _transform API。它的工作方式略有不同,可以这样创建转换:

PUT _plugins/_transform/languages-transform
{
  "transform": {
    "enabled": true,
    "description": "Insert languages",
    "schedule": {
      "interval": {
        "period": 1,
        "unit": "minutes"
      }
    },
    "source_index": "person",
    "target_index": "languages",
    "data_selection_query": {
      "match_all": {}
    },
    "page_size": 1,
    "groups": [{
      "terms": {
        "source_field": "languages.keyword",
        "target_field": "value"
      }
    }]
  }
}