Pyhdfs copy_from_local 导致提供节点名或服务名,或未知错误
Pyhdfs copy_from_local causing nodename nor servname provided, or not known error
我正在使用以下 python 代码使用 pyhdfs
从我的本地系统将文件上传到远程 HDFS
from pyhdfs import HdfsClient
client = HdfsClient(hosts='1.1.1.1',user_name='root')
client.mkdirs('/jarvis')
client.copy_from_local('/my/local/file,'/hdfs/path')
使用 python3.5/。
Hadoop 在默认端口中是 运行:50070
1.1.1.1 是我的远程 Hadoop url
创建目录 "jarvis" 工作正常,但复制文件不工作。我收到以下错误
Traceback (most recent call last):
File "test_hdfs_upload.py", line 14, in
client.copy_from_local('/tmp/data.json','/test.json')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 753, in copy_from_local
self.create(dest, f, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 426, in create
metadata_response.headers['location'], data=data, **self._requests_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 99, in put
return request('put', url, data=data, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-1-1-1-1', port=50075): Max retries exceeded with url: /webhdfs/v1/test.json?op=CREATE&user.name=root&namenoderpcaddress=ip-1-1-1-1:9000&overwrite=false (Caused by : [Errno 8] nodename nor servname provided, or not known)
首先,检查您的 HDFS 集群是否启用了 webhdfs
。 PyHDFS library uses webhdfs 因此需要在 HDFS 配置中启用 webhdfs。
要启用webhdfs,修改hdfs-site.xml
如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/path/to/namenode/dir/</value>
</property>
<property>
<name>dfs.checkpoint.dir</name>
<value>file:/path/to/checkpoint/dir/</value>
</property>
<property>
<name>dfs.checkpoints.edits.dir</name>
<value>file:/path/to/checkpoints-ed/dir/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/path/to/datanode/dir/</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
此外,当从 PyHDFS 库进行 copy_from_local()
API 调用时,HDFS 节点管理器从 HDFS 集群中随机挑选和分配一个节点,当它这样做时,它可能只是 return 与该节点关联的域名。
然后尝试与该域建立 HTTP 连接以执行操作。这是失败的原因,因为您的主机不理解(无法解析)该域名。
要解析域,您需要在 /etc/hosts
文件中添加适当的域映射。
例如,如果您有一个带有名称节点和 2 个数据节点的 HDFS 集群,具有以下 IP 地址和主机名:
- 192.168.0.1 (NameNode1)
- 192.168.0.2 (DataNode1)
- 192.168.0.3 (DataNode2)
您需要按如下方式更新 /etc/hosts
文件:
127.0.0.1 localhost
::1 localhost
192.168.0.1 NameNode1
192.168.0.2 DataNode1
192.168.0.3 DataNode2
这将启用从主机到 HDFS 集群的域名解析,您可以通过 PyHDFS 进行 webhdfs API 调用。
我正在使用以下 python 代码使用 pyhdfs
from pyhdfs import HdfsClient
client = HdfsClient(hosts='1.1.1.1',user_name='root')
client.mkdirs('/jarvis')
client.copy_from_local('/my/local/file,'/hdfs/path')
使用 python3.5/。 Hadoop 在默认端口中是 运行:50070 1.1.1.1 是我的远程 Hadoop url
创建目录 "jarvis" 工作正常,但复制文件不工作。我收到以下错误
Traceback (most recent call last):
File "test_hdfs_upload.py", line 14, in client.copy_from_local('/tmp/data.json','/test.json')
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 753, in copy_from_local self.create(dest, f, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/pyhdfs.py", line 426, in create metadata_response.headers['location'], data=data, **self._requests_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 99, in put return request('put', url, data=data, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/api.py", line 44, in request return session.request(method=method, url=url, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 383, in request resp = self.send(prep, **send_kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/sessions.py", line 486, in send r = adapter.send(request, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/requests/adapters.py", line 378, in send raise ConnectionError(e) requests.exceptions.ConnectionError: HTTPConnectionPool(host='ip-1-1-1-1', port=50075): Max retries exceeded with url: /webhdfs/v1/test.json?op=CREATE&user.name=root&namenoderpcaddress=ip-1-1-1-1:9000&overwrite=false (Caused by : [Errno 8] nodename nor servname provided, or not known)
首先,检查您的 HDFS 集群是否启用了 webhdfs
。 PyHDFS library uses webhdfs 因此需要在 HDFS 配置中启用 webhdfs。
要启用webhdfs,修改hdfs-site.xml
如下:
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/path/to/namenode/dir/</value>
</property>
<property>
<name>dfs.checkpoint.dir</name>
<value>file:/path/to/checkpoint/dir/</value>
</property>
<property>
<name>dfs.checkpoints.edits.dir</name>
<value>file:/path/to/checkpoints-ed/dir/</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/path/to/datanode/dir/</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
此外,当从 PyHDFS 库进行 copy_from_local()
API 调用时,HDFS 节点管理器从 HDFS 集群中随机挑选和分配一个节点,当它这样做时,它可能只是 return 与该节点关联的域名。
然后尝试与该域建立 HTTP 连接以执行操作。这是失败的原因,因为您的主机不理解(无法解析)该域名。
要解析域,您需要在 /etc/hosts
文件中添加适当的域映射。
例如,如果您有一个带有名称节点和 2 个数据节点的 HDFS 集群,具有以下 IP 地址和主机名:
- 192.168.0.1 (NameNode1)
- 192.168.0.2 (DataNode1)
- 192.168.0.3 (DataNode2)
您需要按如下方式更新 /etc/hosts
文件:
127.0.0.1 localhost
::1 localhost
192.168.0.1 NameNode1
192.168.0.2 DataNode1
192.168.0.3 DataNode2
这将启用从主机到 HDFS 集群的域名解析,您可以通过 PyHDFS 进行 webhdfs API 调用。