导入 Google API JSON 文件到 Elasticsearch

Import Google API JSON file to Elasticsearch

我完全不熟悉 ELK 堆栈,尤其是 ES。 我正在尝试导入使用 Google Admin SDK API 获得的 JSON 文件,我想将其导入 Elasticsearch。

到目前为止,这是我数据的 JSON 结构:

{
"kind": "reports#activities",
"nextPageToken": string,
"items": [
{
"kind": "audit#activity",
  "id": {
    "time": datetime,
    "uniqueQualifier": long,
    "applicationName": string,
    "customerId": string
  },
  "actor": {
    "callerType": string,
    "email": string,
    "profileId": long,
    "key": string
  },
  "ownerDomain": string,
  "ipAddress": string,
  "events": [
    {
      "type": string,
      "name": string,
      "parameters": [
        {
          "name": string,
          "value": string,
          "intValue": long,
          "boolValue": boolean
        }
       ]
     }
   ]
  }
 ]
}

所以我决定首先使用这个命令将JSON文件上传到ES中:

curl -s -XPOST 'localhost:9200/_bulk' --data-binary @documents.json

但是我遇到了一些错误:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [START_ARRAY]"}],"type":"illegal_argument_exception","reason":"Malformed action/metadata line [1], expected START_OBJECT or END_OBJECT but found [START_ARRAY]"},"status":400}

我该怎么办?

感谢您的帮助!

JSON 似乎在定义您的文档结构,因此您首先需要创建一个索引,其中包含与该结构匹配的映射。在你的情况下,你可以这样做:

curl -XPUT localhost:9200/reports -d '{
  "nextPageToken": {
    "type": "string"
  },
  "items": {
    "properties": {
      "kind": {
        "type": "string"
      },
      "id": {
        "properties": {
          "time": {
            "type": "date",
            "format": "date_time"
          },
          "uniqueQualifier": {
            "type": "long"
          },
          "applicationName": {
            "type": "string"
          },
          "customerId": {
            "type": "string"
          }
        }
      },
      "actor": {
        "properties": {
          "callerType": {
            "type": "string"
          },
          "email": {
            "type": "string"
          },
          "profileId": {
            "type": "long"
          },
          "key": {
            "type": "string"
          }
        }
      },
      "ownerDomain": {
        "type": "string"
      },
      "ipAddress": {
        "type": "string"
      },
      "events": {
        "properties": {
          "type": {
            "type": "string"
          },
          "name": {
            "type": "string"
          },
          "parameters": {
            "properties": {
              "name": {
                "type": "string"
              },
              "value": {
                "type": "string"
              },
              "intValue": {
                "type": "long"
              },
              "boolValue": {
                "type": "boolean"
              }
            }
          }
        }
      }
    }
  }
}'

完成后,您现在可以使用批量调用为遵循上述结构的 reports#activities 文档编制索引。批量调用的语法是精确定义的here,即你需要一个命令行(做什么),在下一行后面是文档源(索引什么),它不能包含任何新行!

因此,您需要像这样重新格式化 documents.json 文件(确保在第二行之后添加新行)。另请注意,我添加了一些虚拟数据来说明该过程:

{"index": {"_index": "reports", "_type": "activity"}}
{"kind":"reports#activities","nextPageToken":"string","items":[{"kind":"audit#activity","id":{"time":"2016-05-31T00:00:00.000Z","uniqueQualifier":1,"applicationName":"string","customerId":"string"},"actor":{"callerType":"string","email":"string","profileId":1,"key":"string"},"ownerDomain":"string","ipAddress":"string","events":[{"type":"string","name":"string","parameters":[{"name":"string","value":"string","intValue":1,"boolValue":true}]}]}]}