使用 Hue 的 MapReduce oozie 工作流

MapReduce oozie workflow using Hue

我正在研究 AWS 并尝试 create oozie workflow for map only job 使用 hue。我拿了mapreduce action。在尝试了很多方法之后,我无法完成它。我 运行 我在 CLI 工作,工作正常。

我在 HDFS 中创建了一个名为 mapreduce 的目录,并将我的 driver.java 和 mapper.java 放入其中。在 mapreduce 目录下,我创建了 lib 目录并将我的可运行 jar 放入其中。我附上色调界面的屏幕截图。

我遗漏了一些东西,或者我似乎无法将可运行的 jar 放在适当的位置。

我还想在 Hue 中添加一个除了输入和输出目录之外的额外参数。我该怎么做?

我的疑惑在于

2015-11-06 14:56:57,679 WARN [main] org.apache.hadoop.mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).

当我试图查看 oozie:action 日志时。我收到以下消息。

No tasks found for job job_1446129655727_0306.

更新 1

import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.io.*;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

/*
 * Driver class to decompress the zip files.
 */
public class DecompressJob extends Configured implements Tool {

    public static void main(String[] args) throws Exception {
        int res = ToolRunner.run(new Configuration(), new DecompressJob(), args);
        System.exit(res);
    }

    public int run(String[] args) throws Exception {

        Configuration conf = new Configuration();
        conf.set("unzip_files", args[2]);

        Job JobConf = Job.getInstance(conf);
        JobConf.setJobName("mapper class");

        try {
            FileSystem fs = FileSystem.get(getConf());
            if (fs.isDirectory(new Path(args[1]))) {
                fs.delete(new Path(args[1]), true);
            }
        } catch (Exception e) {
        }


        JobConf.setJarByClass(DecompressJob.class);
        JobConf.setOutputKeyClass(LongWritable.class);
        JobConf.setOutputValueClass(Text.class);

        JobConf.setMapperClass(DecompressMapper.class);
        JobConf.setNumReduceTasks(0);
        Path input = new Path(args[0]);
        Path output = new Path(args[1]);

        FileInputFormat.addInputPath(JobConf, input);
        FileOutputFormat.setOutputPath(JobConf, output);

        return JobConf.waitForCompletion(true) ? 0 : 1;
    }
}

我还更新了屏幕截图,添加了更多属性。也发布错误日志

2015-11-07 02:43:31,074 INFO [main] org.apache.hadoop.mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
  2015-11-07 02:43:31,110 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.RuntimeException: java.lang.ClassNotFoundException: Class /user/Ajay/rad_unzip/DecompressMapper.class not found
  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2074)
  at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:186)
  at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:751)
  at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
  at org.apache.hadoop.mapred.YarnChild.run(YarnChild.java:171)
  at java.security.AccessController.doPrivileged(Native Method)
  at javax.security.auth.Subject.doAs(Subject.java:415)
  at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
  at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:166)
  Caused by: java.lang.ClassNotFoundException: Class /user/uname/out/DecompressMapper.class not found
  at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1980)
  at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2072)
  ... 8 more

  2015-11-07 02:43:31,114 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
  2015-11-07 02:43:31,125 WARN [main] org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter: Could not delete hdfs://uname/out/output/_temporary/1/_temporary/attempt_1446129655727_0336_m_000001_1

您应该将驱动器和映射器捆绑在同一个 jar 中。要传递新参数,您可以直接单击 "add property" 并给出一个随机的属性名和属性值。在您的 MR 程序中,您可以使用 "getConf().get("propertyName")" 方法获取值。