socket.gaierror 尝试使用 python mrjob 运行 emr
socket.gaierror when trying to run emr using python mrjob
我目前正在尝试学习 mrjob 以及如何在 AWS EMR 中实现它,所以如果我问的是已经问过的问题[搜索了很多地方但没有找到答案],请原谅我,如果这是一个愚蠢的问题,请原谅
这是我的 python 脚本:
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
当我在本地模式下 运行 时,我得到了结果
命令:
python sample.py input.txt
所以我尝试 运行 在 EMR
通过创建 mrjob.conf 文件
看起来像这样:
runners:
emr:
aws_access_key_id:
aws_secret_access_key:
aws_region: us-west-2a
ec2_key_pair: emr
ec2_key_pair_file: ~/Desktop/emr.pem
ec2_instance_type: m1.small
num_ec2_instances: 5
local:
base_tmp_dir: /tmp
第一次尝试
在我的 windows 系统上本地试用
python check.py -r emr --conf-path ./mrjob.conf word.txt
注:
当我将输入保存在 s3 位置并将其作为参数提供时,出现了同样的错误
我得到了这个回溯:
Traceback (most recent call last):
File "check.py", line 16, in <module>
MRWordFrequencyCount.run()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 461, in run
mr_job.execute()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 479, in execute
super(MRJob, self).execute()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 153, in execute
self.run_job()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 215, in run_job
with self.make_runner() as runner:
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 502, in make_runner
return super(MRJob, self).make_runner()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 168, in make_runner
return EMRJobRunner(**self.emr_job_runner_kwargs())
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 643, in __init__
self._fix_s3_scratch_and_log_uri_opts()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
self._set_s3_scratch_uri(s3_conn)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 776, in _set_s3_scratch_uri
buckets = s3_conn.get_all_buckets()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\retry.py", line 149, in call_and_maybe_retry
return f(*args, **kwargs)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 436, in get_all_buckets
response = self.make_request('GET', headers=headers)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 664, in make_request
retry_handler=retry_handler
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1070, in make_request
retry_handler=retry_handler)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1029, in _mexe
raise ex
socket.gaierror: [Errno 11004] getaddrinfo failed
当我尝试在 aws EC2 实例中运行它时
我遇到了这个错误
Traceback (most recent call last):
File "check.py", line 16, in <module>
MRWordFrequencyCount.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 215, in run_job
with self.make_runner() as runner:
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 502, in make_runner
return super(MRJob, self).make_runner()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 168, in make_runner
return EMRJobRunner(**self.emr_job_runner_kwargs())
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 643, in __init__
self._fix_s3_scratch_and_log_uri_opts()
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
self._set_s3_scratch_uri(s3_conn)
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 776, in _set_s3_scratch_uri
buckets = s3_conn.get_all_buckets()
File "/usr/local/lib/python2.7/dist-packages/mrjob/retry.py", line 149, in call_and_maybe_retry
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 436, in get_all_buckets
response = self.make_request('GET', headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe
raise ex
socket.gaierror: [Errno -2] Name or service not known
我不知道我做错了什么
python 版本 2.7 mrjob 版本 '0.4.5'
经过几个小时的搜索和调查,我发现了问题
在这一行
aws_region: us-west-2a
它应该在的地方
aws_region: us-west-2
我只想保留这个问题,因为它可以节省其他人的时间
我目前正在尝试学习 mrjob 以及如何在 AWS EMR 中实现它,所以如果我问的是已经问过的问题[搜索了很多地方但没有找到答案],请原谅我,如果这是一个愚蠢的问题,请原谅
这是我的 python 脚本:
from mrjob.job import MRJob
class MRWordFrequencyCount(MRJob):
def mapper(self, _, line):
yield "chars", len(line)
yield "words", len(line.split())
yield "lines", 1
def reducer(self, key, values):
yield key, sum(values)
if __name__ == '__main__':
MRWordFrequencyCount.run()
当我在本地模式下 运行 时,我得到了结果
命令:
python sample.py input.txt
所以我尝试 运行 在 EMR
通过创建 mrjob.conf 文件
看起来像这样:
runners:
emr:
aws_access_key_id:
aws_secret_access_key:
aws_region: us-west-2a
ec2_key_pair: emr
ec2_key_pair_file: ~/Desktop/emr.pem
ec2_instance_type: m1.small
num_ec2_instances: 5
local:
base_tmp_dir: /tmp
第一次尝试
在我的 windows 系统上本地试用
python check.py -r emr --conf-path ./mrjob.conf word.txt
注:
当我将输入保存在 s3 位置并将其作为参数提供时,出现了同样的错误
我得到了这个回溯:
Traceback (most recent call last):
File "check.py", line 16, in <module>
MRWordFrequencyCount.run()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 461, in run
mr_job.execute()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 479, in execute
super(MRJob, self).execute()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 153, in execute
self.run_job()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 215, in run_job
with self.make_runner() as runner:
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\job.py", line 502, in make_runner
return super(MRJob, self).make_runner()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\launch.py", line 168, in make_runner
return EMRJobRunner(**self.emr_job_runner_kwargs())
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 643, in __init__
self._fix_s3_scratch_and_log_uri_opts()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
self._set_s3_scratch_uri(s3_conn)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\emr.py", line 776, in _set_s3_scratch_uri
buckets = s3_conn.get_all_buckets()
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\mrjob\retry.py", line 149, in call_and_maybe_retry
return f(*args, **kwargs)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 436, in get_all_buckets
response = self.make_request('GET', headers=headers)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\s3\connection.py", line 664, in make_request
retry_handler=retry_handler
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1070, in make_request
retry_handler=retry_handler)
File "C:\Users\MOB140003207\AppData\Local\Enthought\Canopy32\User\lib\site-pac
kages\boto\connection.py", line 1029, in _mexe
raise ex
socket.gaierror: [Errno 11004] getaddrinfo failed
当我尝试在 aws EC2 实例中运行它时
我遇到了这个错误
Traceback (most recent call last):
File "check.py", line 16, in <module>
MRWordFrequencyCount.run()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 461, in run
mr_job.execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 479, in execute
super(MRJob, self).execute()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 153, in execute
self.run_job()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 215, in run_job
with self.make_runner() as runner:
File "/usr/local/lib/python2.7/dist-packages/mrjob/job.py", line 502, in make_runner
return super(MRJob, self).make_runner()
File "/usr/local/lib/python2.7/dist-packages/mrjob/launch.py", line 168, in make_runner
return EMRJobRunner(**self.emr_job_runner_kwargs())
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 643, in __init__
self._fix_s3_scratch_and_log_uri_opts()
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 760, in _fix_s3_scratch_and_log_uri_opts
self._set_s3_scratch_uri(s3_conn)
File "/usr/local/lib/python2.7/dist-packages/mrjob/emr.py", line 776, in _set_s3_scratch_uri
buckets = s3_conn.get_all_buckets()
File "/usr/local/lib/python2.7/dist-packages/mrjob/retry.py", line 149, in call_and_maybe_retry
return f(*args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 436, in get_all_buckets
response = self.make_request('GET', headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 664, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1071, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1030, in _mexe
raise ex
socket.gaierror: [Errno -2] Name or service not known
我不知道我做错了什么
python 版本 2.7 mrjob 版本 '0.4.5'
经过几个小时的搜索和调查,我发现了问题
在这一行
aws_region: us-west-2a
它应该在的地方
aws_region: us-west-2
我只想保留这个问题,因为它可以节省其他人的时间