geo_point 映射 python 和 StreamSets 在 Elasticsearch 中失败

geo_point mapping python and StreamSets fails with Elasticsearch

我在 elasticsearch 中有这个映射

"mappings": {
          "properties": {
                "fromCoordinates": {"type": "geo_point"},
                "toCoordinates": {"type": "geo_point"},
                "seenCoordinates": {"type": "geo_point"},
            }
        }

使用kibana的控制台,geo_ipfields supported by elasticsearch所有可能的组合都没有问题,即:

(纬度、经度)

PUT /anindex/_doc/1
{
   "fromCoordinates": {
     "lat": 36.857200622558594    
     "lon": 117.21600341796875,

  },
  "toCoordinates": {
    "lat": 22.639299392700195    
    "lon": 113.81099700927734,

  },
  "seenCoordinates": {
     "lat": 36.91663    
     "lon": 117.216,
   }
}

(经度、纬度)

PUT /anindex/_doc/2
{
 "fromCoordinates": [36.857200622558594, 117.21600341796875], 
 "toCoordinates": [22.639299392700195, 113.81099700927734], 
 "seenCoordinates": [36.91663, 117.216] 
}

但是我尝试通过 python 将数据插入到 elasticsearch 中,但我总是遇到这个错误:

RequestError(400, 'illegal_argument_exception', 'mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]')

在python中,我从字典构造json,这是我打印时的结果:

fromCoordinates = {}
fromCoordinates['lat'] = fromLat  
fromCoordinates['lon'] = fromLon 

dataDictionary.update({'fromCoordinates': fromCoordinates , 'toCoordinates': toCoordinates, 'seenCoordinates': seenCoordinates})
print(json.dumps(dataDictionary).encode('utf-8'))
{"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559}, 
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266}, 
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}}

并加载这个

data = json.dumps(dataDictionary).encode('utf-8')
es.create(index='anindex', doc_type='document', id=0, body=data)

array版本同样存在问题:

fromCoordinates = [fromLon, fromLat]

这是在 python 中创建并打印的 json:

{"fromCoordinates": [113.81099700927734, 22.639299392700195], 
  "toCoordinates": [106.8010025024414, 26.53849983215332], 
   "seenCoordinates": [107.46743, 26.34169]}

在这种情况下,我有这样的回应

RequestError: RequestError(400, 'mapper_parsing_exception', 'geo_point expected')

如果我尝试将 StreamSets 用于 elasticsearch,则会出现相同的错误,之前显示了两种类型的 json:

mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]

有什么想法吗?

更新:

GET /anindex/_mapping
{ "anindex" : 
   { "mappings" : 
     { "properties" : 
       { "fromCoordinates" : 
          { "type" : "geo_point" }, 
        "toCoordinates" : 
           { "type" : "geo_point" }, 
        "seenCoordinates" : { "type" : "geo_point" } 
       }
      }
    }
 }

解决方案:

在@jzzfs给出的例子之后,我意识到es.create(index='anindex', doc_type='document', id=0, body=data)中的doc_type参数导致了错误,我删除了它,它起作用了.....但我仍然想知道为什么在 StreamSets 中有同样的错误...但我会继续python。

我怀疑您首先在 fromCoordinates 上拥有 object 映射,然后尝试更新映射。尝试删除并重新创建索引,然后所有这些变体都可以正常工作:


Python

from elasticsearch import Elasticsearch
import time

es_instance = Elasticsearch(['http://localhost:9200'])

es_instance.indices.create(
    'anindex',
    body={"mappings": {
        "properties": {
            "fromCoordinates": {"type": "geo_point"},
            "toCoordinates": {"type": "geo_point"},
            "seenCoordinates": {"type": "geo_point"}
        }
    }})

es_instance.create(
    index="anindex",
    id=0,
    body={
        "fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
        "toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
        "seenCoordinates": {"lat": 33.62672, "lon": 109.37243}})

es_instance.create(
    index="anindex",
    id=1,
    body={
        "fromCoordinates": [
            117.21600341796875,
            36.857200622558594
        ],
        "toCoordinates": [
            113.81099700927734,
            22.639299392700195
        ],
        "seenCoordinates": [
            117.216,
            36.91663
        ]
    })

# syncing is not instant so wait
time.sleep(1)

print(es_instance.count(index="anindex"))


基巴纳:

DELETE anindex

PUT anindex
{
  "mappings": {
    "properties": {
      "fromCoordinates": {
        "type": "geo_point"
      },
      "toCoordinates": {
        "type": "geo_point"
      },
      "seenCoordinates": {
        "type": "geo_point"
      }
    }
  }
}

PUT /anindex/_doc/1
{
  "fromCoordinates": {
    "lat": 36.857200622558594,
    "lon": 117.21600341796875
  },
  "toCoordinates": {
    "lat": 22.639299392700195,
    "lon": 113.81099700927734
  },
  "seenCoordinates": {
    "lat": 36.91663,
    "lon": 117.216
  }
}

PUT /anindex/_doc/2
{
  "fromCoordinates": [
    117.21600341796875,
    36.857200622558594
  ],
  "toCoordinates": [
    113.81099700927734,
    22.639299392700195
  ],
  "seenCoordinates": [
    117.216,
    36.91663
  ]
}

PUT anindex/_doc/3
{
  "fromCoordinates": "22.639299392700195,113.81099700927734",
  "toCoordinates": "26.53849983215332,106.8010025024414",
  "seenCoordinates": "26.34169,107.46743"
}

如果您使用的是旧版本的 elasticsearch(例如 6.1)并升级到较新的版本(例如 7.X)- 您需要像较新版本一样删除索引模式中的 doc_type不再接受这个对象。

旧索引模式

res=es_local.index(index='local-index',doc_type='resource', body=open_doc,id=_id,request_timeout=60)

新索引模式

res=es_local.index(index='local-index', body=open_doc,id=_id,request_timeout=60)

注意:- 新索引模式中没有 doc_type(假设使用 python 进行索引)。