在 PIG 中加载包含数组的 JSON 数据
Load JSON data containing arrays in PIG
我有 JSON 格式的文件:
{"id": "59b6808364fdb09cde10ad3b","balance": ",972.02","age": 35,"eyeColor": "green","tags": ["aute","nostrud","pariatur","adipisicing","irure"]}
{"id": "59b6808334cd60be95e5c166","balance": ",697.85","age": 32,"eyeColor": "blue","tags": ["tempor","non","ad","adipisicing","ut"]}
{"id": "59b680834544a828191abc88","balance": ",102.43","age": 38,"eyeColor": "brown","tags": ["quis","non","ut","veniam","ipsum"]}
我需要将此数据加载到 pig 中。我正在使用:
raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:chararray')
我在使用 dump raw_data;
时没有得到正确的结果
在 Apache PIG 中加载数组的正确数据类型是什么?
还有另一个 question 提到了如何扩展数组,但对于我的情况,我可以在 tags
元素中包含变量元素。
即使我可以将数组转换为字符串然后加载它也没关系。
用 {}
将标签内的字段括起来
raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:{items:chararray}')
我有 JSON 格式的文件:
{"id": "59b6808364fdb09cde10ad3b","balance": ",972.02","age": 35,"eyeColor": "green","tags": ["aute","nostrud","pariatur","adipisicing","irure"]}
{"id": "59b6808334cd60be95e5c166","balance": ",697.85","age": 32,"eyeColor": "blue","tags": ["tempor","non","ad","adipisicing","ut"]}
{"id": "59b680834544a828191abc88","balance": ",102.43","age": 38,"eyeColor": "brown","tags": ["quis","non","ut","veniam","ipsum"]}
我需要将此数据加载到 pig 中。我正在使用:
raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:chararray')
我在使用 dump raw_data;
在 Apache PIG 中加载数组的正确数据类型是什么?
还有另一个 question 提到了如何扩展数组,但对于我的情况,我可以在 tags
元素中包含变量元素。
即使我可以将数组转换为字符串然后加载它也没关系。
用 {}
将标签内的字段括起来raw_data = LOAD '/path/to/file' USING JsonLoader('id:chararray, balance:chararray, age:int, eyeColor:chararray, tags:{items:chararray}')