OrcNewInputformat 作为 hadoop 流的输入格式

OrcNewInputformat as a inputformat for hadoop streaming

我正在使用 hadoop 流,我想将输入格式作为 OrcNewFormat.. 我正在执行命令:-

hadoop jar hadoop-streaming.jar -libjars /usr/hdp/2.2.4.2-2/hive/lib/hive-exec.jar -input /user/orcfiles -output /streamf -mapper 'cat'-输入格式org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat-输出格式org.apache.hadoop.hive.ql.io.orc.OrcNewOutputFormat

但我遇到以下异常:

    Exception in thread "main" java.lang.RuntimeException: class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat not org.apache.hadoop.mapred.InputFormat
        at org.apache.hadoop.conf.Configuration.setClass(Configuration.java:2150)
        at org.apache.hadoop.mapred.JobConf.setInputFormat(JobConf.java:702)
        at org.apache.hadoop.streaming.StreamJob.setJobConf(StreamJob.java:796)
        at org.apache.hadoop.streaming.StreamJob.run(StreamJob.java:128)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:50)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

来自这个link

http://hive.apache.org/javadocs/r1.2.0/api/

我可以看到 Class OrcNewInputFormat 扩展了 org.apache.hadoop.mapreduce.InputFormat,但从异常中我可以看出 class org.apache.hadoop.hive.ql.io.orc.OrcNewInputFormat 而不是 org.apache.hadoop.mapred.InputFormat。

我在这里错过了什么?

IT 工作正常,现在我输入了错误的类名。

从查看次数来看,这是一个非常受欢迎的问题,但就正确的 class 名称而言,它仍然缺少 "answer"。所以完成它:

正确的参数部分是-inputformat org.apache.hadoop.hive.ql.io.orc.OrcInputFormat -outputformat org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat

对于我的情况,在 运行 pig 命令之前,我必须在环境变量下 remove/mark false。

export HADOOP_USE_CLIENT_CLASSLOADER='true'