Hadoop MapReduce Wordcount python 执行错误
Hadoop MapReduce Wordcount python execution error
我正在尝试执行 python MapReduce wordcount 程序
我取自writing a Hadoop MapReduce program in python
只是想了解它是如何工作的,但问题总是作业不成功!
我在 Cloudera VM
中使用这个库执行 mapper.py
和 reducer.py
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
执行命令:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce, tasks=1
-file wordcount/mapper.py
-mapper mapper.py -file wordcount/reducer.py
-reducer reducer.py
-input myinput/test.txt
-output output
问题出在文件 mapper.py 的路径上,reducer.py 必须来自本地
但输入文件必须来自 hdfs 路径
首先,必须使用
在本地测试 python 代码
cat <input file> | python <path from>/mapper.py | python <path from local>/reducer.py
然后在 hdfs
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce,tasks=1 -file <path of local>/mapper.py
-mapper "python <path from local>/mapper.py"
-file <path from local>/reducer.py -
reducer "python <path of local>/reducer.py"
-input <path from hdfs>/myinput/test.txt
-output output
我正在尝试执行 python MapReduce wordcount 程序
我取自writing a Hadoop MapReduce program in python 只是想了解它是如何工作的,但问题总是作业不成功!
我在 Cloudera VM
mapper.py
和 reducer.py
/usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
执行命令:
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce, tasks=1
-file wordcount/mapper.py
-mapper mapper.py -file wordcount/reducer.py
-reducer reducer.py
-input myinput/test.txt
-output output
问题出在文件 mapper.py 的路径上,reducer.py 必须来自本地
但输入文件必须来自 hdfs 路径
首先,必须使用
在本地测试 python 代码cat <input file> | python <path from>/mapper.py | python <path from local>/reducer.py
然后在 hdfs
hadoop jar /usr/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.6.0-mr1-cdh5.12.0.jar
-Dmaperd.reduce,tasks=1 -file <path of local>/mapper.py
-mapper "python <path from local>/mapper.py"
-file <path from local>/reducer.py -
reducer "python <path of local>/reducer.py"
-input <path from hdfs>/myinput/test.txt
-output output