使用 Python 在 elasticsearch 中索引 JSON 文件?
Index JSON files in elasticsearch using Python?
我有一堆JSON个文件(100个),分别命名为merged_file 1.json, merged_file 2. json等等.
如何使用 python(elasticsearch_dsl) 将所有这些文件索引到 elasticsearch 中?
我正在使用这段代码,但它似乎不起作用:
from elasticsearch_dsl import Elasticsearch
import json
import os
import sys
es = Elasticsearch()
json_docs =[]
directory = sys.argv[1]
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
json_docs.append(json.load(open_file))
es.bulk("index_name", "type_name", json_docs)
JSON 看起来像这样:
{"one":["some data"],"two":["some other data"],"three":["other data"]}
我该怎么做才能使其正确?
对于此任务,您应该使用 elasticsearch-py
(pip install elasticsearch
):
from elasticsearch import Elasticsearch, helpers
import sys, json
es = Elasticsearch()
def load_json(directory):
" Use a generator, no need to load all in memory"
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
yield json.load(open_file)
helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')
我有一堆JSON个文件(100个),分别命名为merged_file 1.json, merged_file 2. json等等.
如何使用 python(elasticsearch_dsl) 将所有这些文件索引到 elasticsearch 中?
我正在使用这段代码,但它似乎不起作用:
from elasticsearch_dsl import Elasticsearch
import json
import os
import sys
es = Elasticsearch()
json_docs =[]
directory = sys.argv[1]
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
json_docs.append(json.load(open_file))
es.bulk("index_name", "type_name", json_docs)
JSON 看起来像这样:
{"one":["some data"],"two":["some other data"],"three":["other data"]}
我该怎么做才能使其正确?
对于此任务,您应该使用 elasticsearch-py
(pip install elasticsearch
):
from elasticsearch import Elasticsearch, helpers
import sys, json
es = Elasticsearch()
def load_json(directory):
" Use a generator, no need to load all in memory"
for filename in os.listdir(directory):
if filename.endswith('.json'):
with open(filename,'r') as open_file:
yield json.load(open_file)
helpers.bulk(es, load_json(sys.argv[1]), index='my-index', doc_type='my-type')