Python Elasticsearch 索引错误

Question

Elasticsearch 在今天之前运行良好。

问题：

一些无法索引并出现错误的文档：

u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded'

错误：

"BulkIndexError: (u'14 document(s) failed to index.', [{u'index': {u'status': 400, u'_type': u'mintegral_incent', u'_id': u'168108082', u'error': {u'reason': u'Limit of total fields [1000] in index [mintegral_incent] has been exceeded', u'type': u'illegal_argument_exception'}

使用亚马逊弹性服务

Elasticsearch 版本 5.1

ES 设置：

from elasticsearch import Elasticsearch
from elasticsearch import helpers
es_repo = Elasticsearch(hosts=[settings.ES_INDEX_URL],
                        verify_certs=True)

代码给出问题：

def bulk_index_offers(index_name, id_field, docs):
    actions = []
    for doc in docs:
        action = {
            "_index": index_name,
            "_type": index_name,
            "_id": doc.get(id_field),
            "_source": doc
        }
        actions.append(action)
    # Error at this following line.
    resp = helpers.bulk(es_repo, actions)
    return resp

我尝试过的：

我尝试将块设置得更小，并将 read_timeout 从默认值 10 增加到 30 像这样：resp = helpers.bulk(es_repo, actions, chunks=500, read_timeout=30)

但仍然面临同样的问题。

请帮忙。

Answer 1

默认情况下，映射类型只允许 contain up to 1000 fields，您似乎超出了该限制。为了增加该阈值，您可以运行此命令：

PUT mintegral_incent/_settings
{ 
  "index": {
    "mapping": {
      "total_fields": {
        "limit": "2000"
      }
    }
  }
}

使用 curl，它看起来像这样：

curl -XPUT http://<your.amazon.host>/mintegral_incent/_settings -d '{ 
  "index": {
    "mapping": {
      "total_fields": {
        "limit": "2000"
      }
    }
  }
}'

然后您可以再次运行您的批量脚本，它应该可以工作。

Answer 2

如果您想从 Python 开始工作，请尝试：

import requests

headers = {
    'Content-Type': 'application/json',
}

resp = requests.put('http://localhost:9200/your_index/_settings',
                    headers=headers,
                    data='{"index": {"mapping": {"total_fields": {"limit": "2000"}}}}')

print(f'\nHTTP code: {resp.status_code} -- response: {resp}\n')
print(f'Response text\n{resp.text}')

您也可以使用如上所述的终端，尽管我必须添加 header、-H'Content-Type: application/json'

curl -XPUT http://localhost:9200/your_index/_settings -d '{"index": {"mapping": {"total_fields": {"limit": "2000"}}}}' -H'Content-Type: application/json'

如果您需要使用来自 Python 的 curl 请求（get、put、post），this guide 非常有帮助（这是我回答的来源），甚至提供处理此问题的好方法的代码。

Python Elasticsearch 索引错误

Python Elasticseach indexing error

python

elasticsearch

amazon-elasticsearch

问题：

错误：

ES 设置：

代码给出问题：

我尝试过的：