宁静不向德鲁伊发送数据
Tranquility not sending data to Druid
我正在为我的用例评估 Druid,它通过 tranquility 实时摄取 csv 数据。以下是服务器配置:-
{
"dataSources" : {
"audience" : {
"spec" : {
"dataSchema" : {
"dataSource" : "audience",
"parser" : {
"type" : "string",
"parseSpec":{
"format" : "csv",
"timestampSpec" : {
"column" : "timestamp"
},
"columns" : ["timestamp","partner_id","event_id","product_id","device_id","count"],
"dimensionsSpec" : {
"dimensions" : ["partner_id","event_id","product_id","device_id"]
}
}
},
"metricsSpec" : [{ "type" : "longSum", "name" : total, "fieldName" : "count" }],
"granularitySpec" : {
"segmentGranularity" : "HOUR",
"queryGranularity" : "HOUR",
"intervals" : [ "2013-08-31/2013-09-01" ]
}
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "8"
}
}
数据由 python 脚本随机生成为:-
1471336991,1,960,136,3ZLA7,1
1471336991,1,369,367,8MP2B,1
1471336991,2,544,550,C9ZG8,1
1471336991,1,135,394,XFX31,1
1471336991,2,590,552,VXMTL,1
1471336991,1,493,615,0C2HR,1
1471336991,2,435,710,HKYP0,1
1471336991,1,394,483,V2HP9,1
1471336991,2,441,376,J1LYO,1
以下命令提交数据和returns {"result":{"received":1000,"sent":0}}
python createData.py |curl -XPOST -H'Content-Type: text/plain' --data-binary @- http://localhost:8200/v1/post/audience.
终于可以解决问题了。实际上我是以 Epoch 时间格式向 Druid 发送时间,但它期望 ISO-8601 格式。在 python 中,可以通过以下方式轻松获得:-
datetime.datetime.utcnow().isoformat()
Druid 支持多种时间格式,可以在"timestampSpec"
属性 中指定。
Druid 文档列出了以下时间戳格式:"iso, millis, posix, auto or any Joda time format."
例如以毫秒为单位发送时间:
"timestampSpec" : {
"column" : "timestamp",
"format" : "millis"
}
几件事
- 使用 ISO 8601 日期时间格式
- 确保写入的时间戳在当前小时的 +/- 10 分钟内
我正在为我的用例评估 Druid,它通过 tranquility 实时摄取 csv 数据。以下是服务器配置:-
{
"dataSources" : {
"audience" : {
"spec" : {
"dataSchema" : {
"dataSource" : "audience",
"parser" : {
"type" : "string",
"parseSpec":{
"format" : "csv",
"timestampSpec" : {
"column" : "timestamp"
},
"columns" : ["timestamp","partner_id","event_id","product_id","device_id","count"],
"dimensionsSpec" : {
"dimensions" : ["partner_id","event_id","product_id","device_id"]
}
}
},
"metricsSpec" : [{ "type" : "longSum", "name" : total, "fieldName" : "count" }],
"granularitySpec" : {
"segmentGranularity" : "HOUR",
"queryGranularity" : "HOUR",
"intervals" : [ "2013-08-31/2013-09-01" ]
}
},
"ioConfig" : {
"type" : "realtime"
},
"tuningConfig" : {
"type" : "realtime",
"maxRowsInMemory" : "100000",
"intermediatePersistPeriod" : "PT10M",
"windowPeriod" : "PT10M"
}
},
"properties" : {
"task.partitions" : "1",
"task.replicants" : "1"
}
}
},
"properties" : {
"zookeeper.connect" : "localhost",
"druid.discovery.curator.path" : "/druid/discovery",
"druid.selectors.indexing.serviceName" : "druid/overlord",
"http.port" : "8200",
"http.threads" : "8"
}
}
数据由 python 脚本随机生成为:-
1471336991,1,960,136,3ZLA7,1
1471336991,1,369,367,8MP2B,1
1471336991,2,544,550,C9ZG8,1
1471336991,1,135,394,XFX31,1
1471336991,2,590,552,VXMTL,1
1471336991,1,493,615,0C2HR,1
1471336991,2,435,710,HKYP0,1
1471336991,1,394,483,V2HP9,1
1471336991,2,441,376,J1LYO,1
以下命令提交数据和returns {"result":{"received":1000,"sent":0}}
python createData.py |curl -XPOST -H'Content-Type: text/plain' --data-binary @- http://localhost:8200/v1/post/audience.
终于可以解决问题了。实际上我是以 Epoch 时间格式向 Druid 发送时间,但它期望 ISO-8601 格式。在 python 中,可以通过以下方式轻松获得:-
datetime.datetime.utcnow().isoformat()
Druid 支持多种时间格式,可以在"timestampSpec"
属性 中指定。
Druid 文档列出了以下时间戳格式:"iso, millis, posix, auto or any Joda time format."
例如以毫秒为单位发送时间:
"timestampSpec" : {
"column" : "timestamp",
"format" : "millis"
}
几件事
- 使用 ISO 8601 日期时间格式
- 确保写入的时间戳在当前小时的 +/- 10 分钟内