远程运行 Hadoop 应用程序
Runnng Hadoop Application Remotely
我想通过 java 运行 一个 hadoop 应用程序
如果我 运行 我的应用程序在集群中使用命令 haddoop jar
一切正常。但我需要 运行 远程工作。
我已经为资源管理器和其他属性设置了这样的配置:
jobConf.set("yarn.resourcemanager.address", "192.168.111.9:8032");
jobConf.set("mapreduce.framework.name", "yarn");
jobConf.set("fs.default.name", "hdfs://192.168.111.9:8020");
//If not set throws an error regarding to unable to write on /tmp/hadoop-yarn
jobConf.set("yarn.app.mapreduce.am.staging-dir", "/user");
jobConf.set("mapreduce.app-submission.cross-platform", "true");
jobConf.set("mapreduce.application.classpath", "$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*");
String target = "variables-hadoop-0.0.1-SNAPSHOT.jar";
jobConf.set("mapreduce.job.jar", target)
但每次我 运行 应用程序都无法访问资源管理器,并且日志显示:
2017-01-25 19:36:09,998 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2017-01-25 19:36:11,032 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
并持续尝试很长时间
然后我尝试设置 属性
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030 ");
但是抛出另一个错误
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 192.168.111.9:8030 (configuration property 'yarn.resourcemanager.scheduler.address')
是否有任何"easy way"可以做到这一点?很难找到每一个应该设置的属性。
我运行正在集群上使用 Cloudera - Hadoop 2.7
您在调度程序地址的末尾添加了一个 space,这就是为什么您得到 IllegalArgumentException
变化:
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030 ");
// ^
至
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030");
// ^
我想通过 java 运行 一个 hadoop 应用程序
如果我 运行 我的应用程序在集群中使用命令 haddoop jar
一切正常。但我需要 运行 远程工作。
我已经为资源管理器和其他属性设置了这样的配置:
jobConf.set("yarn.resourcemanager.address", "192.168.111.9:8032");
jobConf.set("mapreduce.framework.name", "yarn");
jobConf.set("fs.default.name", "hdfs://192.168.111.9:8020");
//If not set throws an error regarding to unable to write on /tmp/hadoop-yarn
jobConf.set("yarn.app.mapreduce.am.staging-dir", "/user");
jobConf.set("mapreduce.app-submission.cross-platform", "true");
jobConf.set("mapreduce.application.classpath", "$HADOOP_MAPRED_HOME/*:$HADOOP_MAPRED_HOME/lib/*:$MR2_CLASSPATH:$HADOOP_CLIENT_CONF_DIR:$HADOOP_CONF_DIR:$HADOOP_COMMON_HOME/*:$HADOOP_COMMON_HOME/lib/*:$HADOOP_HDFS_HOME/*:$HADOOP_HDFS_HOME/lib/*:$HADOOP_YARN_HOME/*:$HADOOP_YARN_HOME/lib/*");
String target = "variables-hadoop-0.0.1-SNAPSHOT.jar";
jobConf.set("mapreduce.job.jar", target)
但每次我 运行 应用程序都无法访问资源管理器,并且日志显示:
2017-01-25 19:36:09,998 INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
2017-01-25 19:36:11,032 INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
并持续尝试很长时间
然后我尝试设置 属性
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030 ");
但是抛出另一个错误
java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 192.168.111.9:8030 (configuration property 'yarn.resourcemanager.scheduler.address')
是否有任何"easy way"可以做到这一点?很难找到每一个应该设置的属性。
我运行正在集群上使用 Cloudera - Hadoop 2.7
您在调度程序地址的末尾添加了一个 space,这就是为什么您得到 IllegalArgumentException
变化:
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030 ");
// ^
至
jobConf.set("yarn.resourcemanager.scheduler.address", "192.168.111.9:8030");
// ^