使用 Nifi 构建摄取的 json 数据的可能性
Possibilities for structuring ingested json data using Nifi
是否可以使用 Nifi 将 json 文件加载到结构化 table 中?
我已经调用了以下天气预报数据(来自 6000 个气象站),我目前正在将这些数据加载到 HDFS 中。全部显示在一行中:
{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2017-01-12T22:00:00Z","type":"Forecast","Location":[{"i":"14","lat":"54.9375","lon":"-2.8092","name":"CARLISLE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"50.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"WNW","F":"-3","G":"25","H":"67","Pp":"0","S":"13","T":"2","V":"EX","W":"1","U":"1","$":"720"}}},{"i":"22","lat":"53.5797","lon":"-0.3472","name":"HUMBERSIDE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"24.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"NW","F":"-2","G":"43","H":"63","Pp":"3","S":"25","T":"4","V":"EX","W":"3","U":"1","$":"720"}}}, .....
理想情况下,我希望将模式结构化为 6000 行 table。
我尝试编写一个模式将上述内容传递给 Pig,但没有成功,可能是因为我对 json 不够熟悉,无法正确翻译它。
寻找一种向数据添加一些结构的简单方法,我发现 Nifi 中有一个 PutHBaseJson 处理器。
任何人都可以建议这个 PutHBaseJson 处理器是否可以使用上述数据结构?如果是这样,谁能给我指点一个像样的教程来给我一个配置的起点?
非常感谢任何指导。
您可能想使用 Yolanda Davis 的 SplitJson
processor to split the 6000 record JSON structure into 6000 individual flowfiles. If you need to "inject" the parameter definitions from the top-level response, you can do a ReplaceText
or JoltTransformJSON
operation to manipulate the individual JSON records. Here is a good article 来描述如何在 NiFi 中执行 Jolt 变换 (JSON -> JSON)。
一旦您拥有包含单个 JSON 记录的单个流文件,将它们放入 HBase 就非常容易。 Bryan Bende 为 PutHBaseJson
处理器编写了一个 article describing the necessary configurations。
是否可以使用 Nifi 将 json 文件加载到结构化 table 中?
我已经调用了以下天气预报数据(来自 6000 个气象站),我目前正在将这些数据加载到 HDFS 中。全部显示在一行中:
{"SiteRep":{"Wx":{"Param":[{"name":"F","units":"C","$":"Feels Like Temperature"},{"name":"G","units":"mph","$":"Wind Gust"},{"name":"H","units":"%","$":"Screen Relative Humidity"},{"name":"T","units":"C","$":"Temperature"},{"name":"V","units":"","$":"Visibility"},{"name":"D","units":"compass","$":"Wind Direction"},{"name":"S","units":"mph","$":"Wind Speed"},{"name":"U","units":"","$":"Max UV Index"},{"name":"W","units":"","$":"Weather Type"},{"name":"Pp","units":"%","$":"Precipitation Probability"}]},"DV":{"dataDate":"2017-01-12T22:00:00Z","type":"Forecast","Location":[{"i":"14","lat":"54.9375","lon":"-2.8092","name":"CARLISLE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"50.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"WNW","F":"-3","G":"25","H":"67","Pp":"0","S":"13","T":"2","V":"EX","W":"1","U":"1","$":"720"}}},{"i":"22","lat":"53.5797","lon":"-0.3472","name":"HUMBERSIDE AIRPORT","country":"ENGLAND","continent":"EUROPE","elevation":"24.0","Period":{"type":"Day","value":"2017-01-13Z","Rep":{"D":"NW","F":"-2","G":"43","H":"63","Pp":"3","S":"25","T":"4","V":"EX","W":"3","U":"1","$":"720"}}}, .....
理想情况下,我希望将模式结构化为 6000 行 table。
我尝试编写一个模式将上述内容传递给 Pig,但没有成功,可能是因为我对 json 不够熟悉,无法正确翻译它。
寻找一种向数据添加一些结构的简单方法,我发现 Nifi 中有一个 PutHBaseJson 处理器。
任何人都可以建议这个 PutHBaseJson 处理器是否可以使用上述数据结构?如果是这样,谁能给我指点一个像样的教程来给我一个配置的起点?
非常感谢任何指导。
您可能想使用 Yolanda Davis 的 SplitJson
processor to split the 6000 record JSON structure into 6000 individual flowfiles. If you need to "inject" the parameter definitions from the top-level response, you can do a ReplaceText
or JoltTransformJSON
operation to manipulate the individual JSON records. Here is a good article 来描述如何在 NiFi 中执行 Jolt 变换 (JSON -> JSON)。
一旦您拥有包含单个 JSON 记录的单个流文件,将它们放入 HBase 就非常容易。 Bryan Bende 为 PutHBaseJson
处理器编写了一个 article describing the necessary configurations。