Mongo-connector 是否支持在插入到 Elasticsearch 之前添加字段？

Question

我在 mongoDB 中有很多文档。 Mongo-connector 将这些数据插入到 elasticsearch。有没有办法，在插入到 ES 之前，我们可以在文档中添加额外的字段，然后插入到 elasticsearch？在 mongo-connector 中有什么方法可以做到上述吗？

更新

根据你的UPDATE 3我创建了类似这样的映射是正确的吗？

PUT my_index2
{
 "mappings":{
  "my_type2": {
  "transform": {
  "script": {
    "inline": "if (ctx._source.geopoint.alt) ctx._source.geopoint.remove('alt')",
    "lang": "groovy"
  }
},
"properties": {
  "geopoint": {
    "type": "geo_point"
  }
 }
}
}
}

错误

这是我在尝试插入您的映射时不断遇到的错误

{
   "error": {
  "root_cause": [
     {
        "type": "script_parse_exception",
        "reason": "Value must be of type String: [script]"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "Failed to parse mapping [my_type2]: Value must be of type String: [script]",
  "caused_by": {
     "type": "script_parse_exception",
     "reason": "Value must be of type String: [script]"
  }
   },
   "status": 400
}

更新 2

现在正在插入映射并得到确认为真。但是当尝试插入如下所示的 json 数据时，它会抛出错误。

PUT my_index2/my_type2/1
{
 "geopoint": {
        "lon": 48.845877,
        "lat": 8.821861,
        "alt": 0.0
        }
}

更新 2 错误

{
   "error": {
  "root_cause": [
     {
        "type": "mapper_parsing_exception",
        "reason": "failed to parse"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "failed to parse",
  "caused_by": {
     "type": "illegal_argument_exception",
     "reason": "failed to execute script",
     "caused_by": {
        "type": "script_exception",
        "reason": "scripts of type [inline], operation [mapping] and lang [groovy] are disabled"
     }
  }
  },
  "status": 400
}

更新 2 的错误 1

添加 script.inline:true 后，尝试插入数据但出现以下错误。

{
   "error": {
  "root_cause": [
     {
        "type": "parse_exception",
        "reason": "field must be either [lat], [lon] or [geohash]"
     }
  ],
  "type": "mapper_parsing_exception",
  "reason": "failed to parse",
  "caused_by": {
     "type": "parse_exception",
     "reason": "field must be either [lat], [lon] or [geohash]"
  }
   },
   "status": 400
}

Answer 1

mongo-connector 旨在将 Mongo 数据库与另一个目标系统同步，例如 ES、Solr 或另一个 Mongo 数据库。同步意味着 1:1 复制，所以据我所知，mongo-连接器无法在复制期间丰富文档（这也不是它的意图）。

但是，在 ES 5 中，我们很快就能使用 ingest nodes in which we'll be able to define processing pipelines，其目标是在文档被索引之前丰富它们。

更新

修改formatters.py文件可能有办法。

在transform_value中我会添加一个案例来处理Geopoint:

    if isinstance(value, dict):
        return self.format_document(value)
    elif isinstance(value, list):
        return [self.transform_value(v) for v in value]

    # handle Geopoint class
    elif isinstance(value, Geopoint):
        return self.format.document({'lat': value['lat'], 'lon': value['lon']})

    ...

更新 2

让我们尝试另一种方法，修改 transform_element function（第 104 行）：

def transform_element(self, key, value):
    try:
        # add these next two lines
        if key == 'GeoPoint':
            value = {'lat': value['lat'], 'lon': value['lon']}
        # do not modify the initial code below
        new_value = self.transform_value(value)
        yield key, new_value
    except ValueError as e:
        LOG.warn("Invalid value for key: %s as %s"
                 % (key, str(e)))

更新 3

您可以尝试的另一件事是添加 transform. The reason I've not mentioned it before is that it was deprecated in ES 2.0, but in ES 5.0 you'll have ingest nodes and you'll be able to take care of it at ingest time using a remove processor

您可以这样定义您的映射：

PUT my_index2
{
  "mappings": {
    "my_type2": {
      "transform": {
        "script": "ctx._source.geopoint.remove('alt'); ctx._source.geopoint.remove('valid')"
      },
      "properties": {
        "geopoint": {
          "type": "geo_point"
        }
      }
    }
  }
}

注意：通过将 script.inline: true 添加到 elasticsearch.yml 并重新启动您的 ES 节点，确保启用动态脚本。

接下来会发生的是 alt 字段在存储的 _source 中仍然可见，但不会被索引，因此不会发生错误。

使用 ES 5，您只需创建一个带有 remove 处理器的管道，如下所示：

PUT _ingest/pipeline/geo-pipeline
{
  "description" : "remove unsupported altitude field",
  "processors" : [
    {
      "remove" : {
        "field": "geopoint.alt"
      }
    }
  ]
}

Mongo-connector 是否支持在插入到 Elasticsearch 之前添加字段？

Does Mongo-connector supports adding fields before inserting to Elasticsearch?

mongodb

elasticsearch

elasticsearch-2.0