将 avro 文件批量索引到 elasticsearch
Indexing avro file to elasticsearch in bulk
我写了这个简短的脚本
from elasticsearch import Elasticsearch
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
for record in avro_reader:
es.index(index="my_index", body=record)
它工作得很好。每条记录都是 json 个,Elasticsearch 可以索引 json 个文件。但不是在 for 循环中一个一个地进行,有没有办法批量执行此操作?因为这样很慢。
有两种方法可以做到这一点。
- 批量使用 Elasticsearch API 和
requests
python
- 使用内部调用相同批量的 Elasticsearch python 库 API
from elasticsearch import Elasticsearch
from elasticsearch import helpers
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
records = [
{
"_index": "my_index",
"_type": "record",
"_id": j,
"_source": record
}
for j,record in enumerate(avro_reader)
]
helpers.bulk(es, records)
我写了这个简短的脚本
from elasticsearch import Elasticsearch
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
for record in avro_reader:
es.index(index="my_index", body=record)
它工作得很好。每条记录都是 json 个,Elasticsearch 可以索引 json 个文件。但不是在 for 循环中一个一个地进行,有没有办法批量执行此操作?因为这样很慢。
有两种方法可以做到这一点。
- 批量使用 Elasticsearch API 和
requests
python - 使用内部调用相同批量的 Elasticsearch python 库 API
from elasticsearch import Elasticsearch
from elasticsearch import helpers
from fastavro import reader
es = Elasticsearch(['someIP:somePort'])
with open('data.avro', 'rb') as fo:
avro_reader = reader(fo)
records = [
{
"_index": "my_index",
"_type": "record",
"_id": j,
"_source": record
}
for j,record in enumerate(avro_reader)
]
helpers.bulk(es, records)