流媒体命令失败!在 CentOS7 上的单节点 hadoop 集群设置中执行 MapReduce python 代码时

Streaming Command Failed! when execute MapReduce python code in single node hadoop cluster setup on CentOS7

我已经在同一台机器上成功执行了 mapreduce java 代码。现在我正在尝试在同一台机器上执行用 python 编写的 Mapreduce 代码。为此,我使用 hadoop_3.2.1 和 hadoop-streaming-3.2.1.jar.

我已经通过命令测试了代码

[dsawale@localhost ~]$ cat Desktop/sample.txt | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py | sort | python PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py

我发现它显示正确的输出。

但是当我尝试使用命令在 hadoop 集群上执行时

[dsawale@localhost ~]$ hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar -mapper mapper.py -reducer reducer.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py -input /sample.txt -output pysamp

我得到的输出为:

packageJobJar: [PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py, /tmp/hadoop-unjar6715579504628929924/] [] /tmp/streamjob3211585412475799030.jar tmpDir=null
Streaming Command Failed!

这是我的第一个 python MapReduce 程序。你能帮我摆脱这个错误吗? 谢谢!

配置文件: mapred-site.xml

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
            <name>yarn.app.mapreduce.am.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
        <property>
            <name>mapreduce.map.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
        <property>
            <name>mapreduce.reduce.env</name>
            <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
    </property>
</configuration>

核心-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
    <property>
            <name>dfs.replication</name>
            <value>1</value>
    </property>
    <property>  
        <name>dfs.permission</name>
        <value>false</value>
    </property>
    <property>  
        <name>dfs.namenode.name.dir</name>
        <value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/home/dsawale/hadoop-3.2.1/hadoop2_data/hdfs/datanode</value>
    </property>
</configuration>

纱-site.xml:

    <configuration>
<!-- Site specific YARN configuration properties -->
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
</configuration>

您传递给 mapperreducer 参数的文件路径不正确。

试试,

hadoop jar Desktop/JAR/hadoop-streaming-3.2.1.jar \
-mapper PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-reducer PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py  \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountMapper.py \
-file PycharmProjects/MapReduceCode/com/code/wordcount/WordCountReducer.py \
-input /sample.txt \
-output pysamp