Hadoop3:连接到 ResourceManager 的工作节点错误
Hadoop3: worker node error connecting to ResourceManager
我有一个 3 节点的 hadoop 集群(DigitalOcean 液滴):
- hadoop-master 同时配置为 namenode 和 datanode
- hadoop-worker1 和 hadoop-worker2 配置为数据节点
每当我 运行 一个 mapreduce 流作业和一个工作节点被选择到 运行 ApplicationMaster 时,该作业在尝试连接到 ResourceManager 时挂起。 datanode 日志显示它尝试连接到 0.0.0.0
INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s);
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s);
也就是默认值yarn.resourcemanager.hostname property
。
但是我在 yarn-site.xml 中为我的两个工作节点指定了这个 属性:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
在我所有的节点上,我的 /etc/hosts 文件看起来像这样,所以 hadoop-master 应该指向正确的 IP 地址。
#127.0.1.1 hadoop-worker1 hadoop-worker1
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
#ff02::3 ip6-allhosts
165.22.19.161 hadoop-master
165.22.19.154 hadoop-worker1
165.22.19.158 hadoop-worker2
我还检查了配置,方法是转到 hadoop-worker1:9864 并访问工作节点的 Web 界面以查看加载的内容:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
甚至,我尝试 运行从其中一个工作节点执行 YARN 命令,它实际上可以正确联系 ResourceManager:
hadoop@hadoop-worker1:/opt/hadoop$ yarn node --list
2019-06-15 18:47:42,119 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/165.22.19.161:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop-worker2:40673 RUNNING hadoop-worker2:8042 0
hadoop-worker1:41875 RUNNING hadoop-worker1:8042 1
hadoop-master:40075 RUNNING hadoop-master:8042 0
hadoop@hadoop-worker1:/opt/hadoop$
我不确定该怎么做,我相信这可能与流媒体作业没有正确加载设置有关,任何帮助将不胜感激,因为我已经在这个问题上停留了 2 天。
我已将 -D yarn.resourcemanager.hostname=hadoop-master
标志添加到 mapred 流式传输命令,它现在似乎可以工作了。
我有一个 3 节点的 hadoop 集群(DigitalOcean 液滴):
- hadoop-master 同时配置为 namenode 和 datanode
- hadoop-worker1 和 hadoop-worker2 配置为数据节点
每当我 运行 一个 mapreduce 流作业和一个工作节点被选择到 运行 ApplicationMaster 时,该作业在尝试连接到 ResourceManager 时挂起。 datanode 日志显示它尝试连接到 0.0.0.0
INFO [main] org.apache.hadoop.yarn.client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 0 time(s);
INFO [main] org.apache.hadoop.ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8030. Already tried 1 time(s);
也就是默认值yarn.resourcemanager.hostname property
。
但是我在 yarn-site.xml 中为我的两个工作节点指定了这个 属性:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
</property>
在我所有的节点上,我的 /etc/hosts 文件看起来像这样,所以 hadoop-master 应该指向正确的 IP 地址。
#127.0.1.1 hadoop-worker1 hadoop-worker1
127.0.0.1 localhost
# The following lines are desirable for IPv6 capable hosts
#::1 ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
#ff02::3 ip6-allhosts
165.22.19.161 hadoop-master
165.22.19.154 hadoop-worker1
165.22.19.158 hadoop-worker2
我还检查了配置,方法是转到 hadoop-worker1:9864 并访问工作节点的 Web 界面以查看加载的内容:
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-master</value>
<final>false</final>
<source>yarn-site.xml</source>
</property>
甚至,我尝试 运行从其中一个工作节点执行 YARN 命令,它实际上可以正确联系 ResourceManager:
hadoop@hadoop-worker1:/opt/hadoop$ yarn node --list
2019-06-15 18:47:42,119 INFO client.RMProxy: Connecting to ResourceManager at hadoop-master/165.22.19.161:8032
Total Nodes:3
Node-Id Node-State Node-Http-Address Number-of-Running-Containers
hadoop-worker2:40673 RUNNING hadoop-worker2:8042 0
hadoop-worker1:41875 RUNNING hadoop-worker1:8042 1
hadoop-master:40075 RUNNING hadoop-master:8042 0
hadoop@hadoop-worker1:/opt/hadoop$
我不确定该怎么做,我相信这可能与流媒体作业没有正确加载设置有关,任何帮助将不胜感激,因为我已经在这个问题上停留了 2 天。
我已将 -D yarn.resourcemanager.hostname=hadoop-master
标志添加到 mapred 流式传输命令,它现在似乎可以工作了。