重试连接服务器:已尝试 9 次;重试策略是 RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
Retrying connect to server: Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
我有三个物理节点。在每个节点中,我使用此命令输入 docker。
docker run -v /home/user/.ssh:/root/.ssh --privileged
-p 5050:5050 -p 5051:5051 -p 5052:5052 -p 2181:2181 -p 8089:8081
-p 6123:6123 -p 8084:8080 -p 50090:50090 -p 50070:50070
-p 9000:9000 -p 2888:2888 -p 3888:3888 -p 4041:4040 -p 8020:8020
-p 8485:8485 -p 7078:7077 -p 52222:22 -e WEAVE_CIDR=10.32.0.3/12
-e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
-e LIBPROCESS_IP=10.32.0.3
-e MESOS_RESOURCES=ports*:[11000-11999]
-ti hadoop_marathon_mesos_flink_2 /bin/bash
我这样配置 hadoop:
核心-site.xml :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>
qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/hadoop/dfs/jn</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new
nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>Unique identifiers for each NameNode in the
nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>10.32.0.1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>10.32.0.2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>10.32.0.1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>10.32.0.2:50070</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>
org.apache.hadoop.hdfs.server.namenode.ha.
ConfiguredFailoverProxyProvider
</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.
ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>10.32.0.1:2181,10.32.0.2:2181,10.32.0.3:2181</value>
</property>
</configuration>
问题是当我格式化 namenode:
hadoop namenode -format
无法格式化namenode。我收到此错误:
2019-05-06 06:35:09,969 INFO ipc.Client: Retrying connect to server: 10.32.0.2/10.32.0.2:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-05-06 06:35:09,969 INFO ipc.Client: Retrying connect to server: 10.32.0.3/10.32.0.3:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-05-06 06:35:09,987 ERROR namenode.NameNode: Failed to start namenode.
org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:
10.32.0.1:8485: Call From 50c5244de4cd/10.32.0.1 to 50c5244de4cd:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
我已经发布了 Hadoop 所需的端口,但我仍然收到 连接被拒绝。
有人能告诉我配置有什么问题吗?
提前致谢。
问题解决是因为 core-site.xml 中的 zookeeper 配置。我在下面解释了高可用 hadoop 配置的详细信息:
hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>Unique identifiers for each NameNode in
the nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>10.32.0.1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>10.32.0.2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>10.32.0.1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>10.32.0.2:50070</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.
server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>
qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value> false </value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
core-site.xml(例如在节点“10.32.0.1”中):
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/hadoop/dfs/journalnode</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>0.0.0.0:2181,10.32.0.2:2181,10.32.0.3:2181</value>
</property>
例如“10.32.0.1”中的 Zookeeper 配置是:
server.1=0.0.0.0:2888:3888
server.2=10.32.0.2:2888:3888
server.3=10.32.0.3:2888:3888
此外,我在 /var/lib/zookeeper/data 中创建了 myid 文件,其中包含该节点的 ID。
首先,删除所有波纹管文件夹:
rm -rf /tmp/hadoop/dfs/journalnode
rm -rf /usr/local/hadoop_store/hdfs/namenode
rm -rf /usr/local/hadoop_store/hdfs/datanode
rm -rf /opt/hadoop/logs/*
然后,创建这些文件夹:
mkdir /usr/local/hadoop_store/hdfs/namenode
mkdir /usr/local/hadoop_store/hdfs/datanode
之后,给这些文件夹正确的权限:
chmod 777 /usr/local/hadoop_store/hdfs/namenode
chmod 777 /usr/local/hadoop_store/hdfs/datanode
chown -R root /usr/local/hadoop_store/hdfs/namenode
chown -R root /usr/local/hadoop_store/hdfs/datanode
chmod 777 /tmp/hadoop/dfs/journalnode
chown -R root /tmp/hadoop/dfs/journalnode
现在您可以按照这个阶段格式化这些文件夹。
最重要的是如何格式化这三个节点。您必须遵循以下阶段:
1.停止hdfs服务
2. 只启动日志节点(因为它们需要知道格式)
/opt/hadoop/bin/hdfs --daemon start journalnode
在第一个名称节点上(作为用户 hdfs 或 root)
hadoop namenode -format
在期刊节点上:
hdfs namenode -initializeSharedEdits -force
重新启动 Zookeeper:
/home/zookeeper-3.4.14/bin/zkServer.sh 重启
格式化动物园管理员:
hdfs zkfc -formatZK -force (to force zookeeper to reinitialise)
重新启动第一个名称节点:
/opt/hadoop/bin/hdfs --daemon start namenode
在第二个名称节点上:
hdfs namenode -bootstrapStandby -force (force synch with first namenode)
在每个数据节点上清除数据目录:
hadoop datanode -format
重启HDFS服务:
/opt/hadoop/sbin/start-dfs.sh
顺便说一句,我有三个节点,两个名称节点和一个数据节点。
可以在/opt/hadoop/logs/.
查看hadoop登录
我有三个物理节点。在每个节点中,我使用此命令输入 docker。
docker run -v /home/user/.ssh:/root/.ssh --privileged
-p 5050:5050 -p 5051:5051 -p 5052:5052 -p 2181:2181 -p 8089:8081
-p 6123:6123 -p 8084:8080 -p 50090:50090 -p 50070:50070
-p 9000:9000 -p 2888:2888 -p 3888:3888 -p 4041:4040 -p 8020:8020
-p 8485:8485 -p 7078:7077 -p 52222:22 -e WEAVE_CIDR=10.32.0.3/12
-e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins
-e LIBPROCESS_IP=10.32.0.3
-e MESOS_RESOURCES=ports*:[11000-11999]
-ti hadoop_marathon_mesos_flink_2 /bin/bash
我这样配置 hadoop: 核心-site.xml :
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>
qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/hadoop/dfs/jn</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new
nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>Unique identifiers for each NameNode in the
nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>10.32.0.1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>10.32.0.2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>10.32.0.1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>10.32.0.2:50070</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>
org.apache.hadoop.hdfs.server.namenode.ha.
ConfiguredFailoverProxyProvider
</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>shell(/bin/true)</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.
ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>10.32.0.1:2181,10.32.0.2:2181,10.32.0.3:2181</value>
</property>
</configuration>
问题是当我格式化 namenode:
hadoop namenode -format
无法格式化namenode。我收到此错误:
2019-05-06 06:35:09,969 INFO ipc.Client: Retrying connect to server: 10.32.0.2/10.32.0.2:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-05-06 06:35:09,969 INFO ipc.Client: Retrying connect to server: 10.32.0.3/10.32.0.3:8485. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-05-06 06:35:09,987 ERROR namenode.NameNode: Failed to start namenode. org.apache.hadoop.hdfs.qjournal.client.QuorumException: Unable to check if JNs are ready for formatting. 1 exceptions thrown:
10.32.0.1:8485: Call From 50c5244de4cd/10.32.0.1 to 50c5244de4cd:8485 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
我已经发布了 Hadoop 所需的端口,但我仍然收到 连接被拒绝。
有人能告诉我配置有什么问题吗?
提前致谢。
问题解决是因为 core-site.xml 中的 zookeeper 配置。我在下面解释了高可用 hadoop 配置的详细信息: hdfs-site.xml:
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
<description>Logical name for this new nameservice</description>
</property>
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
<description>Unique identifiers for each NameNode in
the nameservice</description>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>10.32.0.1:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>10.32.0.2:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>10.32.0.1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>10.32.0.2:50070</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.
server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>
qjournal://10.32.0.1:8485;10.32.0.2:8485;10.32.0.3:8485/mycluster
</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value> false </value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hdfs/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>hdfs</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///usr/local/hadoop_store/hdfs/datanode</value>
</property>
<property>
<name>dfs.namenode.datanode.registration.ip-hostname-check</name>
<value>false</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
core-site.xml(例如在节点“10.32.0.1”中):
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/tmp/hadoop/dfs/journalnode</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>0.0.0.0:2181,10.32.0.2:2181,10.32.0.3:2181</value>
</property>
例如“10.32.0.1”中的 Zookeeper 配置是:
server.1=0.0.0.0:2888:3888
server.2=10.32.0.2:2888:3888
server.3=10.32.0.3:2888:3888
此外,我在 /var/lib/zookeeper/data 中创建了 myid 文件,其中包含该节点的 ID。 首先,删除所有波纹管文件夹:
rm -rf /tmp/hadoop/dfs/journalnode
rm -rf /usr/local/hadoop_store/hdfs/namenode
rm -rf /usr/local/hadoop_store/hdfs/datanode
rm -rf /opt/hadoop/logs/*
然后,创建这些文件夹:
mkdir /usr/local/hadoop_store/hdfs/namenode
mkdir /usr/local/hadoop_store/hdfs/datanode
之后,给这些文件夹正确的权限:
chmod 777 /usr/local/hadoop_store/hdfs/namenode
chmod 777 /usr/local/hadoop_store/hdfs/datanode
chown -R root /usr/local/hadoop_store/hdfs/namenode
chown -R root /usr/local/hadoop_store/hdfs/datanode
chmod 777 /tmp/hadoop/dfs/journalnode
chown -R root /tmp/hadoop/dfs/journalnode
现在您可以按照这个阶段格式化这些文件夹。 最重要的是如何格式化这三个节点。您必须遵循以下阶段: 1.停止hdfs服务 2. 只启动日志节点(因为它们需要知道格式)
/opt/hadoop/bin/hdfs --daemon start journalnode
在第一个名称节点上(作为用户 hdfs 或 root)
hadoop namenode -format
在期刊节点上:
hdfs namenode -initializeSharedEdits -force
重新启动 Zookeeper:
/home/zookeeper-3.4.14/bin/zkServer.sh 重启
格式化动物园管理员:
hdfs zkfc -formatZK -force (to force zookeeper to reinitialise)
重新启动第一个名称节点:
/opt/hadoop/bin/hdfs --daemon start namenode
在第二个名称节点上:
hdfs namenode -bootstrapStandby -force (force synch with first namenode)
在每个数据节点上清除数据目录:
hadoop datanode -format
重启HDFS服务:
/opt/hadoop/sbin/start-dfs.sh
顺便说一句,我有三个节点,两个名称节点和一个数据节点。 可以在/opt/hadoop/logs/.
查看hadoop登录