geo_point 映射 python 和 StreamSets 在 Elasticsearch 中失败
geo_point mapping python and StreamSets fails with Elasticsearch
我在 elasticsearch 中有这个映射
"mappings": {
"properties": {
"fromCoordinates": {"type": "geo_point"},
"toCoordinates": {"type": "geo_point"},
"seenCoordinates": {"type": "geo_point"},
}
}
使用kibana的控制台,geo_ipfields supported by elasticsearch所有可能的组合都没有问题,即:
(纬度、经度)
PUT /anindex/_doc/1
{
"fromCoordinates": {
"lat": 36.857200622558594
"lon": 117.21600341796875,
},
"toCoordinates": {
"lat": 22.639299392700195
"lon": 113.81099700927734,
},
"seenCoordinates": {
"lat": 36.91663
"lon": 117.216,
}
}
(经度、纬度)
PUT /anindex/_doc/2
{
"fromCoordinates": [36.857200622558594, 117.21600341796875],
"toCoordinates": [22.639299392700195, 113.81099700927734],
"seenCoordinates": [36.91663, 117.216]
}
但是我尝试通过 python 将数据插入到 elasticsearch 中,但我总是遇到这个错误:
RequestError(400, 'illegal_argument_exception', 'mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]')
在python中,我从字典构造json,这是我打印时的结果:
fromCoordinates = {}
fromCoordinates['lat'] = fromLat
fromCoordinates['lon'] = fromLon
dataDictionary.update({'fromCoordinates': fromCoordinates , 'toCoordinates': toCoordinates, 'seenCoordinates': seenCoordinates})
print(json.dumps(dataDictionary).encode('utf-8'))
{"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}}
并加载这个
data = json.dumps(dataDictionary).encode('utf-8')
es.create(index='anindex', doc_type='document', id=0, body=data)
array版本同样存在问题:
fromCoordinates = [fromLon, fromLat]
这是在 python 中创建并打印的 json:
{"fromCoordinates": [113.81099700927734, 22.639299392700195],
"toCoordinates": [106.8010025024414, 26.53849983215332],
"seenCoordinates": [107.46743, 26.34169]}
在这种情况下,我有这样的回应
RequestError: RequestError(400, 'mapper_parsing_exception', 'geo_point expected')
如果我尝试将 StreamSets 用于 elasticsearch,则会出现相同的错误,之前显示了两种类型的 json:
mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]
有什么想法吗?
更新:
GET /anindex/_mapping
{ "anindex" :
{ "mappings" :
{ "properties" :
{ "fromCoordinates" :
{ "type" : "geo_point" },
"toCoordinates" :
{ "type" : "geo_point" },
"seenCoordinates" : { "type" : "geo_point" }
}
}
}
}
解决方案:
在@jzzfs给出的例子之后,我意识到es.create(index='anindex', doc_type='document', id=0, body=data)
中的doc_type参数导致了错误,我删除了它,它起作用了.....但我仍然想知道为什么在 StreamSets 中有同样的错误...但我会继续python。
我怀疑您首先在 fromCoordinates
上拥有 object
映射,然后尝试更新映射。尝试删除并重新创建索引,然后所有这些变体都可以正常工作:
Python
from elasticsearch import Elasticsearch
import time
es_instance = Elasticsearch(['http://localhost:9200'])
es_instance.indices.create(
'anindex',
body={"mappings": {
"properties": {
"fromCoordinates": {"type": "geo_point"},
"toCoordinates": {"type": "geo_point"},
"seenCoordinates": {"type": "geo_point"}
}
}})
es_instance.create(
index="anindex",
id=0,
body={
"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}})
es_instance.create(
index="anindex",
id=1,
body={
"fromCoordinates": [
117.21600341796875,
36.857200622558594
],
"toCoordinates": [
113.81099700927734,
22.639299392700195
],
"seenCoordinates": [
117.216,
36.91663
]
})
# syncing is not instant so wait
time.sleep(1)
print(es_instance.count(index="anindex"))
基巴纳:
DELETE anindex
PUT anindex
{
"mappings": {
"properties": {
"fromCoordinates": {
"type": "geo_point"
},
"toCoordinates": {
"type": "geo_point"
},
"seenCoordinates": {
"type": "geo_point"
}
}
}
}
PUT /anindex/_doc/1
{
"fromCoordinates": {
"lat": 36.857200622558594,
"lon": 117.21600341796875
},
"toCoordinates": {
"lat": 22.639299392700195,
"lon": 113.81099700927734
},
"seenCoordinates": {
"lat": 36.91663,
"lon": 117.216
}
}
PUT /anindex/_doc/2
{
"fromCoordinates": [
117.21600341796875,
36.857200622558594
],
"toCoordinates": [
113.81099700927734,
22.639299392700195
],
"seenCoordinates": [
117.216,
36.91663
]
}
PUT anindex/_doc/3
{
"fromCoordinates": "22.639299392700195,113.81099700927734",
"toCoordinates": "26.53849983215332,106.8010025024414",
"seenCoordinates": "26.34169,107.46743"
}
如果您使用的是旧版本的 elasticsearch(例如 6.1)并升级到较新的版本(例如 7.X)- 您需要像较新版本一样删除索引模式中的 doc_type不再接受这个对象。
旧索引模式
res=es_local.index(index='local-index',doc_type='resource', body=open_doc,id=_id,request_timeout=60)
新索引模式
res=es_local.index(index='local-index', body=open_doc,id=_id,request_timeout=60)
注意:- 新索引模式中没有 doc_type(假设使用 python 进行索引)。
我在 elasticsearch 中有这个映射
"mappings": {
"properties": {
"fromCoordinates": {"type": "geo_point"},
"toCoordinates": {"type": "geo_point"},
"seenCoordinates": {"type": "geo_point"},
}
}
使用kibana的控制台,geo_ipfields supported by elasticsearch所有可能的组合都没有问题,即:
(纬度、经度)
PUT /anindex/_doc/1
{
"fromCoordinates": {
"lat": 36.857200622558594
"lon": 117.21600341796875,
},
"toCoordinates": {
"lat": 22.639299392700195
"lon": 113.81099700927734,
},
"seenCoordinates": {
"lat": 36.91663
"lon": 117.216,
}
}
(经度、纬度)
PUT /anindex/_doc/2
{
"fromCoordinates": [36.857200622558594, 117.21600341796875],
"toCoordinates": [22.639299392700195, 113.81099700927734],
"seenCoordinates": [36.91663, 117.216]
}
但是我尝试通过 python 将数据插入到 elasticsearch 中,但我总是遇到这个错误:
RequestError(400, 'illegal_argument_exception', 'mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]')
在python中,我从字典构造json,这是我打印时的结果:
fromCoordinates = {}
fromCoordinates['lat'] = fromLat
fromCoordinates['lon'] = fromLon
dataDictionary.update({'fromCoordinates': fromCoordinates , 'toCoordinates': toCoordinates, 'seenCoordinates': seenCoordinates})
print(json.dumps(dataDictionary).encode('utf-8'))
{"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}}
并加载这个
data = json.dumps(dataDictionary).encode('utf-8')
es.create(index='anindex', doc_type='document', id=0, body=data)
array版本同样存在问题:
fromCoordinates = [fromLon, fromLat]
这是在 python 中创建并打印的 json:
{"fromCoordinates": [113.81099700927734, 22.639299392700195],
"toCoordinates": [106.8010025024414, 26.53849983215332],
"seenCoordinates": [107.46743, 26.34169]}
在这种情况下,我有这样的回应
RequestError: RequestError(400, 'mapper_parsing_exception', 'geo_point expected')
如果我尝试将 StreamSets 用于 elasticsearch,则会出现相同的错误,之前显示了两种类型的 json:
mapper [fromCoordinates] of different type, current_type [geo_point], merged_type [ObjectMapper]
有什么想法吗?
更新:
GET /anindex/_mapping
{ "anindex" :
{ "mappings" :
{ "properties" :
{ "fromCoordinates" :
{ "type" : "geo_point" },
"toCoordinates" :
{ "type" : "geo_point" },
"seenCoordinates" : { "type" : "geo_point" }
}
}
}
}
解决方案:
在@jzzfs给出的例子之后,我意识到es.create(index='anindex', doc_type='document', id=0, body=data)
中的doc_type参数导致了错误,我删除了它,它起作用了.....但我仍然想知道为什么在 StreamSets 中有同样的错误...但我会继续python。
我怀疑您首先在 fromCoordinates
上拥有 object
映射,然后尝试更新映射。尝试删除并重新创建索引,然后所有这些变体都可以正常工作:
Python
from elasticsearch import Elasticsearch
import time
es_instance = Elasticsearch(['http://localhost:9200'])
es_instance.indices.create(
'anindex',
body={"mappings": {
"properties": {
"fromCoordinates": {"type": "geo_point"},
"toCoordinates": {"type": "geo_point"},
"seenCoordinates": {"type": "geo_point"}
}
}})
es_instance.create(
index="anindex",
id=0,
body={
"fromCoordinates": {"lat": 43.9962005615, "lon": 125.684997559},
"toCoordinates": {"lat": 40.080101013183594, "lon": 116.58499908447266},
"seenCoordinates": {"lat": 33.62672, "lon": 109.37243}})
es_instance.create(
index="anindex",
id=1,
body={
"fromCoordinates": [
117.21600341796875,
36.857200622558594
],
"toCoordinates": [
113.81099700927734,
22.639299392700195
],
"seenCoordinates": [
117.216,
36.91663
]
})
# syncing is not instant so wait
time.sleep(1)
print(es_instance.count(index="anindex"))
基巴纳:
DELETE anindex
PUT anindex
{
"mappings": {
"properties": {
"fromCoordinates": {
"type": "geo_point"
},
"toCoordinates": {
"type": "geo_point"
},
"seenCoordinates": {
"type": "geo_point"
}
}
}
}
PUT /anindex/_doc/1
{
"fromCoordinates": {
"lat": 36.857200622558594,
"lon": 117.21600341796875
},
"toCoordinates": {
"lat": 22.639299392700195,
"lon": 113.81099700927734
},
"seenCoordinates": {
"lat": 36.91663,
"lon": 117.216
}
}
PUT /anindex/_doc/2
{
"fromCoordinates": [
117.21600341796875,
36.857200622558594
],
"toCoordinates": [
113.81099700927734,
22.639299392700195
],
"seenCoordinates": [
117.216,
36.91663
]
}
PUT anindex/_doc/3
{
"fromCoordinates": "22.639299392700195,113.81099700927734",
"toCoordinates": "26.53849983215332,106.8010025024414",
"seenCoordinates": "26.34169,107.46743"
}
如果您使用的是旧版本的 elasticsearch(例如 6.1)并升级到较新的版本(例如 7.X)- 您需要像较新版本一样删除索引模式中的 doc_type不再接受这个对象。
旧索引模式
res=es_local.index(index='local-index',doc_type='resource', body=open_doc,id=_id,request_timeout=60)
新索引模式
res=es_local.index(index='local-index', body=open_doc,id=_id,request_timeout=60)
注意:- 新索引模式中没有 doc_type(假设使用 python 进行索引)。