OrcNewInputformat 作为 hadoop 流的输入格式
OrcNewInputformat as a inputformat for hadoop streaming
我正在使用 hadoop 流,我想将输入格式作为 OrcNewFormat..
我正在执行命令:-
hadoop jar hadoop-streaming.jar -libjars /usr/hdp/2.2.4.2-2/hive/lib/hive-exec.jar -input /user/orcfiles -output /streamf -mapper 'cat'-输入格式org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat-输出格式org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat
但我遇到以下异常:
Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat not org.apache.hadoop.mapred.InputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:2150)
at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:702)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:796)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:128)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
来自这个link
http://hive.apache.org/javadocs/r1.2.0/api/
我可以看到 Class OrcNewInputFormat 扩展了 org.apache.hadoop.mapreduce.InputFormat,但从异常中我可以看出 class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat 而不是 org.apache.hadoop.mapred.InputFormat。
我在这里错过了什么?
IT 工作正常,现在我输入了错误的类名。
从查看次数来看,这是一个非常受欢迎的问题,但就正确的 class 名称而言,它仍然缺少 "answer"。所以完成它:
正确的参数部分是-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat -outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
对于我的情况,在 运行 pig 命令之前,我必须在环境变量下 remove/mark false。
export HADOOP_USE_CLIENT_CLASSLOADER='true'
我正在使用 hadoop 流,我想将输入格式作为 OrcNewFormat.. 我正在执行命令:-
hadoop jar hadoop-streaming.jar -libjars /usr/hdp/2.2.4.2-2/hive/lib/hive-exec.jar -input /user/orcfiles -output /streamf -mapper 'cat'-输入格式org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat-输出格式org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat
但我遇到以下异常:
Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat not org.apache.hadoop.mapred.InputFormat
at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:2150)
at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:702)
at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:796)
at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:128)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
来自这个link
http://hive.apache.org/javadocs/r1.2.0/api/
我可以看到 Class OrcNewInputFormat 扩展了 org.apache.hadoop.mapreduce.InputFormat,但从异常中我可以看出 class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat 而不是 org.apache.hadoop.mapred.InputFormat。
我在这里错过了什么?
IT 工作正常,现在我输入了错误的类名。
从查看次数来看,这是一个非常受欢迎的问题,但就正确的 class 名称而言,它仍然缺少 "answer"。所以完成它:
正确的参数部分是-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat -outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat
对于我的情况,在 运行 pig 命令之前,我必须在环境变量下 remove/mark false。
export HADOOP_USE_CLIENT_CLASSLOADER='true'