无法通过 API 创建 Druid 摄取任务
Can't create Druid ingestion task through API
当我向 Druid overlord API 发送 JSON 摄取规范时,我收到了这样的回复:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Date: Wed, 25 Sep 2019 11:44:18 GMT
Server: Jetty(9.4.10.v20180503)
Transfer-Encoding: chunked
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.IndexTask] value failed: null"
}
如果我将 index
任务类型更改为 index_parallel
,那么我会得到:
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask] value failed: null"
}
通过 Druid 的网络使用相同的摄取规范 UI 工作正常。
这是我使用的摄取规范(略微修改以隐藏敏感数据):
{
"type": "index_parallel",
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [
{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}
],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
Overlord API URI 如下所示:
http://host:8081/druid/indexer/v1/task
HTTPie 命令发送API请求:
http --print=Hhb POST http://host:8081/druid/indexer/v1/task < test_spec.json
此外,如果我尝试在 Airflow
中使用 DruidHook class 发送摄取任务,我也会遇到同样的问题
我找到了解决方案。显然,Druid UI 生成的规格与 API 使用的规格 JSON 格式略有不同。规范中的高级对象("ioConfig"、"dataSchema" 和 "tuningConfig")应包装在 spec
对象中,如下所示:
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
}
UI 尝试标准化任务(批处理)和主管(流)规范之间的规范。我添加了一个德鲁伊问题来解决这个问题:https://github.com/apache/incubator-druid/issues/8662
当我向 Druid overlord API 发送 JSON 摄取规范时,我收到了这样的回复:
HTTP/1.1 400 Bad Request
Content-Type: application/json
Date: Wed, 25 Sep 2019 11:44:18 GMT
Server: Jetty(9.4.10.v20180503)
Transfer-Encoding: chunked
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.IndexTask] value failed: null"
}
如果我将 index
任务类型更改为 index_parallel
,那么我会得到:
{
"error": "Instantiation of [simple type, class org.apache.druid.indexing.common.task.batch.parallel.ParallelIndexSupervisorTask] value failed: null"
}
通过 Druid 的网络使用相同的摄取规范 UI 工作正常。
这是我使用的摄取规范(略微修改以隐藏敏感数据):
{
"type": "index_parallel",
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [
{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [
{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}
],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
Overlord API URI 如下所示:
http://host:8081/druid/indexer/v1/task
HTTPie 命令发送API请求:
http --print=Hhb POST http://host:8081/druid/indexer/v1/task < test_spec.json
此外,如果我尝试在 Airflow
中使用 DruidHook class 发送摄取任务,我也会遇到同样的问题我找到了解决方案。显然,Druid UI 生成的规格与 API 使用的规格 JSON 格式略有不同。规范中的高级对象("ioConfig"、"dataSchema" 和 "tuningConfig")应包装在 spec
对象中,如下所示:
{
"type": "index_parallel",
"spec": {
"dataSchema": {
"dataSource": "daily_xport_test",
"granularitySpec": {
"type": "uniform",
"segmentGranularity": "MONTH",
"queryGranularity": "NONE",
"rollup": false
},
"parser": {
"type": "string",
"parseSpec": {
"format": "json",
"timestampSpec": {
"column": "dateday",
"format": "auto"
},
"dimensionsSpec": {
"dimensions": [{
"type": "string",
"name": "id",
"createBitmapIndex": true
},
{
"type": "long",
"name": "clicks_count_total"
},
{
"type": "long",
"name": "ctr"
},
"deleted",
"device_type",
"target_url"
]
}
}
}
},
"ioConfig": {
"type": "index_parallel",
"firehose": {
"type": "static-google-blobstore",
"blobs": [{
"bucket": "data-test",
"path": "/sample_data/daily_export_18092019/000000000000.json.gz"
}],
"filter": "*.json.gz$"
},
"appendToExisting": false
},
"tuningConfig": {
"type": "index_parallel",
"maxNumSubTasks": 1,
"maxRowsInMemory": 1000000,
"pushTimeout": 0,
"maxRetry": 3,
"taskStatusCheckPeriodMs": 1000,
"chatHandlerTimeout": "PT10S",
"chatHandlerNumRetries": 5
}
}
}
UI 尝试标准化任务(批处理)和主管(流)规范之间的规范。我添加了一个德鲁伊问题来解决这个问题:https://github.com/apache/incubator-druid/issues/8662