Apache PIG、ELEPHANTBIRDJSON 加载器
Apache PIG, ELEPHANTBIRDJSON Loader
我正在尝试使用 Elephantbird json 加载程序
解析以下输入(此输入中有 2 条记录)
[{"node_disk_lnum_1":36,"node_disk_xfers_in_rate_sum":136.40000000000001,"node_disk_bytes_in_rate_22":
187392.0, "node_disk_lnum_7": 13}]
[{"node_disk_lnum_1": 36, "node_disk_xfers_in_rate_sum":
105.2,"node_disk_bytes_in_rate_22": 123084.8, "node_disk_lnum_7":13}]
这是我的语法:
register '/home/data/Desktop/elephant-bird-pig-4.1.jar';
a = LOAD '/pig/tc1.log' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS
node_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') AS
node_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') AS
node_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') AS
node_disk_lnum_7;
DESCRIBE b;
b 描述结果:
b: {node_disk_lnum_1: bytearray,node_disk_xfers_in_rate_sum:
bytearray,node_disk_bytes_in_rate_22: bytearray,node_disk_lnum_7:
bytearray}
c = FOREACH b GENERATE node_disk_lnum_1;
DESCRIBE c;
c: {node_disk_lnum_1: bytearray}
DUMP c;
预期结果:
36, 136.40000000000001, 187392.0, 13
36, 105.2, 123084.8, 13
抛出以下错误
2017-02-06 01:05:49,337 [main] INFO
org.apache.pig.tools.pigstats.ScriptState - Pig features used in the
script: UNKNOWN 2017-02-06 01:05:49,386 [main] INFO
org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not
set... will not generate code. 2017-02-06 01:05:49,387 [main] INFO
org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer -
{RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator,
GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter,
MergeFilter, MergeForEach, PartitionFilterOptimizer,
PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter,
SplitFilter, StreamTypeCastInserter]} 2017-02-06 01:05:49,390 [main]
INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Map
key required for a: [=17=]->[node_disk_lnum_1,
node_disk_xfers_in_rate_sum, node_disk_bytes_in_rate_22,
node_disk_lnum_7]
2017-02-06 01:05:49,395 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler
- File concatenation threshold: 100 optimistic? false 2017-02-06 01:05:49,398 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1 2017-02-06 01:05:49,398 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1 2017-02-06 01:05:49,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig
script settings are added to the job 2017-02-06 01:05:49,426 [main]
INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-02-06 01:05:49,428 [main] ERROR
org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal
error. com/twitter/elephantbird/util/HadoopCompat
请帮忙看看我遗漏了什么?
您的 json 中没有任何嵌套数据,因此删除 -nestedload
a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);
我正在尝试使用 Elephantbird json 加载程序
解析以下输入(此输入中有 2 条记录)[{"node_disk_lnum_1":36,"node_disk_xfers_in_rate_sum":136.40000000000001,"node_disk_bytes_in_rate_22": 187392.0, "node_disk_lnum_7": 13}]
[{"node_disk_lnum_1": 36, "node_disk_xfers_in_rate_sum": 105.2,"node_disk_bytes_in_rate_22": 123084.8, "node_disk_lnum_7":13}]
这是我的语法:
register '/home/data/Desktop/elephant-bird-pig-4.1.jar';
a = LOAD '/pig/tc1.log' USING
com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (json:map[]);
b = FOREACH a GENERATE flatten(json#'node_disk_lnum_1') AS
node_disk_lnum_1,flatten(json#'node_disk_xfers_in_rate_sum') AS
node_disk_xfers_in_rate_sum,flatten(json#'node_disk_bytes_in_rate_22') AS
node_disk_bytes_in_rate_22, flatten(json#'node_disk_lnum_7') AS
node_disk_lnum_7;
DESCRIBE b;
b 描述结果:
b: {node_disk_lnum_1: bytearray,node_disk_xfers_in_rate_sum: bytearray,node_disk_bytes_in_rate_22: bytearray,node_disk_lnum_7: bytearray}
c = FOREACH b GENERATE node_disk_lnum_1;
DESCRIBE c;
c: {node_disk_lnum_1: bytearray}
DUMP c;
预期结果:
36, 136.40000000000001, 187392.0, 13
36, 105.2, 123084.8, 13
抛出以下错误
2017-02-06 01:05:49,337 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2017-02-06 01:05:49,386 [main] INFO org.apache.pig.data.SchemaTupleBackend - Key [pig.schematuple] was not set... will not generate code. 2017-02-06 01:05:49,387 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2017-02-06 01:05:49,390 [main] INFO org.apache.pig.newplan.logical.rules.ColumnPruneVisitor - Map key required for a: [=17=]->[node_disk_lnum_1, node_disk_xfers_in_rate_sum, node_disk_bytes_in_rate_22, node_disk_lnum_7]
2017-02-06 01:05:49,395 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2017-02-06 01:05:49,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2017-02-06 01:05:49,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2017-02-06 01:05:49,425 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2017-02-06 01:05:49,426 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-02-06 01:05:49,428 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 2998: Unhandled internal error. com/twitter/elephantbird/util/HadoopCompat
请帮忙看看我遗漏了什么?
您的 json 中没有任何嵌套数据,因此删除 -nestedload
a = LOAD '/pig/tc1.log' USING com.twitter.elephantbird.pig.load.JsonLoader() as (json:map[]);