使用 Elasticsearch 处理器将新文档添加到单独的索引
Adding a new document to a separate index using Elasticsearch processors
有没有办法在我索引某些文档时填充单独的索引?
假设我有类似的东西:
PUT person/_doc/1
{
"name": "Jonh Doe",
"languages": ["english", "spanish"]
}
PUT person/_doc/2
{
"name": "Jane Doe",
"languages": ["english", "russian"]
}
我想要的是每增加一个人,就在语言索引中增加一种语言。
类似于:
GET languages/_search
会给出:
...
"hits" : [
{
"_index" : "languages",
"_type" : "doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"value" : "russian"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"value" : "english"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"value" : "spanish"
}
}
...
考虑管道,但我没有看到任何处理器允许这样的事情。
也许答案是创建自定义处理器。我已经有了一个,但不确定如何在单独的索引中插入文档。
更新:按照@Val answer works 中的描述使用transforms,似乎确实是正确的答案...
但是,我使用的是 Open Distro for Elasticsearch,那里不提供转换。一些在那里工作的替代解决方案将不胜感激:)
更新 2:看起来 OpenSearch 正在替换 Open Distro for Elasticsearch。还有一个 transform api \o/
您只需更改摄取管道中的 _index
字段名称:
{
"description" : "sets the value of count to 1",
"set": {
"if": "[*your condition here*]",
"field": "_index",
"value": "languages",
"override": true
}
}
无法像在 Logstash 中那样克隆或拆分进入摄取管道的每个文档。因此,您不能从一个文档中索引两个文档。
但是,在为您的个人文档编制索引后,绝对可以点击 _transform
API endpoint 并从 person
一个创建 languages
索引:
首先创建转换:
PUT _transform/languages-transform
{
"source": {
"index": "person"
},
"pivot": {
"group_by": {
"language": {
"terms": {
"field": "languages.keyword"
}
}
},
"aggregations": {
"count": {
"value_count": {
"field": "languages.keyword"
}
}
}
},
"dest": {
"index": "languages",
"pipeline": "set-id"
}
}
您还需要创建将为您的语言文档设置正确 ID 的管道:
PUT _ingest/pipeline/set-id
{
"processors": [
{
"set": {
"field": "_id",
"value": "{{language}}"
}
}
]
}
然后,你就可以开始改造了:
POST _transform/languages-transform/_start
完成后,您将拥有一个名为 languages
的新索引,其内容为
GET languages/_search
=>
"hits" : [
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"count" : 4,
"language" : "english"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "russian"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "spanish"
}
}
]
请注意,您还可以按计划设置该转换,以便它 运行 定期进行,或者您可以 运行 在适合您的时候手动进行,以重建语言索引。
OpenSearch 有自己的 _transform API。它的工作方式略有不同,可以这样创建转换:
PUT _plugins/_transform/languages-transform
{
"transform": {
"enabled": true,
"description": "Insert languages",
"schedule": {
"interval": {
"period": 1,
"unit": "minutes"
}
},
"source_index": "person",
"target_index": "languages",
"data_selection_query": {
"match_all": {}
},
"page_size": 1,
"groups": [{
"terms": {
"source_field": "languages.keyword",
"target_field": "value"
}
}]
}
}
有没有办法在我索引某些文档时填充单独的索引?
假设我有类似的东西:
PUT person/_doc/1
{
"name": "Jonh Doe",
"languages": ["english", "spanish"]
}
PUT person/_doc/2
{
"name": "Jane Doe",
"languages": ["english", "russian"]
}
我想要的是每增加一个人,就在语言索引中增加一种语言。
类似于:
GET languages/_search
会给出:
...
"hits" : [
{
"_index" : "languages",
"_type" : "doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"value" : "russian"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"value" : "english"
}
},
{
"_index" : "languages",
"_type" : "doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"value" : "spanish"
}
}
...
考虑管道,但我没有看到任何处理器允许这样的事情。
也许答案是创建自定义处理器。我已经有了一个,但不确定如何在单独的索引中插入文档。
更新:按照@Val answer works 中的描述使用transforms,似乎确实是正确的答案...
但是,我使用的是 Open Distro for Elasticsearch,那里不提供转换。一些在那里工作的替代解决方案将不胜感激:)
更新 2:看起来 OpenSearch 正在替换 Open Distro for Elasticsearch。还有一个 transform api \o/
您只需更改摄取管道中的 _index
字段名称:
{
"description" : "sets the value of count to 1",
"set": {
"if": "[*your condition here*]",
"field": "_index",
"value": "languages",
"override": true
}
}
无法像在 Logstash 中那样克隆或拆分进入摄取管道的每个文档。因此,您不能从一个文档中索引两个文档。
但是,在为您的个人文档编制索引后,绝对可以点击 _transform
API endpoint 并从 person
一个创建 languages
索引:
首先创建转换:
PUT _transform/languages-transform
{
"source": {
"index": "person"
},
"pivot": {
"group_by": {
"language": {
"terms": {
"field": "languages.keyword"
}
}
},
"aggregations": {
"count": {
"value_count": {
"field": "languages.keyword"
}
}
}
},
"dest": {
"index": "languages",
"pipeline": "set-id"
}
}
您还需要创建将为您的语言文档设置正确 ID 的管道:
PUT _ingest/pipeline/set-id
{
"processors": [
{
"set": {
"field": "_id",
"value": "{{language}}"
}
}
]
}
然后,你就可以开始改造了:
POST _transform/languages-transform/_start
完成后,您将拥有一个名为 languages
的新索引,其内容为
GET languages/_search
=>
"hits" : [
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "english",
"_score" : 1.0,
"_source" : {
"count" : 4,
"language" : "english"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "russian",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "russian"
}
},
{
"_index" : "languages",
"_type" : "_doc",
"_id" : "spanish",
"_score" : 1.0,
"_source" : {
"count" : 2,
"language" : "spanish"
}
}
]
请注意,您还可以按计划设置该转换,以便它 运行 定期进行,或者您可以 运行 在适合您的时候手动进行,以重建语言索引。
OpenSearch 有自己的 _transform API。它的工作方式略有不同,可以这样创建转换:
PUT _plugins/_transform/languages-transform
{
"transform": {
"enabled": true,
"description": "Insert languages",
"schedule": {
"interval": {
"period": 1,
"unit": "minutes"
}
},
"source_index": "person",
"target_index": "languages",
"data_selection_query": {
"match_all": {}
},
"page_size": 1,
"groups": [{
"terms": {
"source_field": "languages.keyword",
"target_field": "value"
}
}]
}
}