将新行 JSON 上传到 Elasticsearch 批量 API

Upload new line JSON to Elasticsearch bulk API

我正在尝试使用批量 API 将新行 JSON 上传到 Elasticsearch。我正在上传的批量 JSON 看起来像这样,每个 JSON 在一个新行上:

{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": false, "first_seen": "2020-03-31", "last_seen": "2020-04-15", "actor": "unknown", "tags": ["ADB Worm", "HTTP Alt Scanner", "Mirai", "Web Scanner"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "", "asn": "ASxxx", "tor": false, "os": "Linux 2.2-3.x", "category": "isp"}, "raw_data": {"scan": [{"port": 80, "protocol": "TCP"}, {"port": 81, "protocol": "TCP"}, {"port": 88, "protocol": "TCP"}, {"port": 5555, "protocol": "TCP"}, {"port": 8080, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2020-04-09", "last_seen": "2020-04-11", "actor": "unknown", "tags": ["Eternalblue", "SMB Scanner"], "cve": ["CVE-2017-0144"], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "host.somehost.com", "asn": "ASxxx", "tor": false, "os": "Windows 7/8", "category": "isp"}, "raw_data": {"scan": [{"port": 445, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2019-09-05", "last_seen": "2020-04-06", "actor": "unknown", "tags": ["Mirai"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "redacted", "asn": "ASxxx", "tor": false, "os": "Linux 2.2.x-3.x (Embedded)", "category": "isp"}, "raw_data": {"scan": [{"port": 23, "protocol": "TCP"}, {"port": 2323, "protocol": "TCP"}], "web": {}, "ja3": []}}

JSON 的头部没有索引或键。所以当然,当我尝试使用此命令上传它时(my_index 是一个没有映射的空白索引)。

curl -s -H 'Content-Type: application/x-ndjson' -X POST http://localhost:9200/my_index/_bulk --data-binary @my_newline_json.json

我收到错误消息:

{"error":{"root_cause":[{"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\n]"}],"type":"illegal_argument_exception","reason":"The bulk request must be terminated by a newline [\n]"},"status":400}

因此,如果我按照 the docs 正确理解问题,问题是错误是因为在 JSON 的开头没有指定索引或类型。我的问题是我不明白如何添加必要的索引和类型以便可以读取 JSON。

我正在使用 Curl 创建和添加数据到我的索引,所以最好的方法是格式化 curl 命令以正确创建索引并允许我的 JSON 被上传?

(我之前使用过 MosheZada 的优秀 Elasticsearch_loader 工具,它允许您在命令中指定索引和类型。这很有效,但我试图了解发生了什么该命令以及我如何在需要时用 Curl 做同样的事情。)

curl -H 'Content-Type: application/x-ndjson' -XPOST 'localhost:9200/index-name/doc-type/_bulk?pretty' --data-binary @my_newline_json.json

将您的批量 JSON 更改为以下格式。您的 my_newline_json.json 应如下所示:

{"index":{}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": false, "first_seen": "2020-03-31", "last_seen": "2020-04-15", "actor": "unknown", "tags": ["ADB Worm", "HTTP Alt Scanner", "Mirai", "Web Scanner"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "", "asn": "ASxxx", "tor": false, "os": "Linux 2.2-3.x", "category": "isp"}, "raw_data": {"scan": [{"port": 80, "protocol": "TCP"}, {"port": 81, "protocol": "TCP"}, {"port": 88, "protocol": "TCP"}, {"port": 5555, "protocol": "TCP"}, {"port": 8080, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"index":{}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2020-04-09", "last_seen": "2020-04-11", "actor": "unknown", "tags": ["Eternalblue", "SMB Scanner"], "cve": ["CVE-2017-0144"], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "host.somehost.com", "asn": "ASxxx", "tor": false, "os": "Windows 7/8", "category": "isp"}, "raw_data": {"scan": [{"port": 445, "protocol": "TCP"}], "web": {}, "ja3": []}}
{"index":{}}
{"ip": "x.x.x.x", "seen": true, "classification": "malicious", "spoofable": true, "first_seen": "2019-09-05", "last_seen": "2020-04-06", "actor": "unknown", "tags": ["Mirai"], "cve": [], "metadata": {"country": "United Kingdom", "country_code": "GB", "city": "redacted", "organization": "redacted", "rdns": "redacted", "asn": "ASxxx", "tor": false, "os": "Linux 2.2.x-3.x (Embedded)", "category": "isp"}, "raw_data": {"scan": [{"port": 23, "protocol": "TCP"}, {"port": 2323, "protocol": "TCP"}], "web": {}, "ja3": []}}

不要忘记在内容末尾添加新行。

批量格式 JSON:

输出结果: