Hadoop：没有这样的文件或目录

Question

所以我对 Hadoop 和命令行是全新的，尽管我以前做过一些编程（作为学生）。我正在尝试运行在学校机器上使用 Putty 的一些简单程序（教程的一部分）。

我之前已经让 Hadoop 命令工作，运行一个不同的简单程序也很好，但我坚持使用这个。不，这不是家庭作业。只是了解 Hadoop 命令的教程。

说明如下：

/*

测试代码

我们执行符合典型 UNIX 风格管道的本地测试，我们的测试将采用以下形式：

猫 |地图 |排序 |减少它模拟 Hadoop 在流式传输时将执行的相同管道，尽管是以非分布式方式进行的。您必须确保文件 mapper.py 和 reducer.py 具有执行权限：

chmod u+x mapper.py chmod u+x reducer.py

尝试以下命令并解释结果（提示：在终端中键入 man sort window 以了解有关排序命令的更多信息）：

回声"this is a test and this should count the number of words" | ./mapper.py |排序-k1,1 | ./reducer.py

*/

运行 "hdfs dfs -ls /user/$USER 给出以下结果：

找到 6 件商品 drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/QuasiMonteCarlo_1445334654365_163883167 drwxr-xr-x - s1353460 s1353460 0 2015-10-20 10:51 /user/s1353460/data -rw-r--r-- 3 s1353460 s1353460 360 2015-10-20 12:13 /user/s1353460/mapper.py -rw-r--r-- 3 s1353460 s1353460 15346 2015-10-20 11:11 /user/s1353460/part-r-00000 -rw-r--r-- 2 s1353460 s1353460 728 2015-10-21 10:21 /user/s1353460/reducer.py drwxr-xr-x - s1353460 s1353460 0 2015-10-16 14:38 /user/s1353460/source

但是运行ning "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns 错误：

-bash: /user/s1353460/reducer.py: 没有那个文件或目录 -bash: /user/s1353460/mapper.py: 没有那个文件或目录

这看起来很奇怪，因为恰好在上面列出了那个位置。知道这里会发生什么吗？

Answer 1

But running "echo "this is a test and this should count the number of words" | /user/$USER/mapper.py | sort -k1,1 | /user/$USER/reducer.py" returns errors:

-bash: /user/s1353460/reducer.py: No such file or directory -bash: /user/s1353460/mapper.py: No such file or directory

您已经在 HDFS[=66= 上创建了 mapper.py & reducer.py ].当你运行这个命令时，它会在你的本地文件系统上搜索 mapper.py 和 reducer.py HDFS.

解决这个问题：

确保 /user/s1353460/ 存在于 your local file system 上。如果没有，创建相同的，然后复制或创建 mapper.py & reducer.py in /user/s1353460/

确保mapper.py有执行权限chmod +x /user/s1353460/mapper.py

确保reducer.py有执行权限chmod +x /user/s1353460/reducer.py

运行 echo "this is a test and this should count the number of words" | /user/s1353460/mapper.py | sort -k1,1 | /user/s1353460/reducer.py 这次应该可以正常运行了。

至运行 Python Hadoop 集群上的 MapReduce 作业：

hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar \ -file /user/s1353460/mapper.py -mapper /user/s1353460/mapper.py \ -file /user/s1353460/reducer.py -reducer /user/s1353460/reducer.py \ -input <hdfs-input-path> -output <hdfs-output-path>

假设： Hadoop安装在/usr/local/hadoop .适当更改路径。

Answer 2

基本上，使用 echo，您是在本地测试您的文件，根本不接触 HDFS。 HDFS 是一个文件系统抽象......但这是另一个话题。

如果 mapper.py 或 reducer.py 不在您的当前目录中，无论它们是否在同一路径的 HDFS 中，您都会遇到上述问题。

要将本地 python 文件用于 hadoop 流，您需要使用流 jar（其位置取决于您的安装），请参阅 this post here。

Hadoop：没有这样的文件或目录

Hadoop: No such file or directory

putty

hadoop