EMR 上的 MapReduce 不联系 RMProxy 并卡住等待资源管理器？

Question

我运行 mapreduce/hadoop 在 EMR 上使用 hadoop 2.7.3。从 AWS 直接安装，jar 是用 maven shade 插件构建的。它在等待 ResourceManager 时无限卡住，但我在日志文件或在线上完全找不到任何东西。

在job.waitForCompletion中，得出一行如下：

020-01-25 05:52:41,346 INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl (main): Timeline service address: http://ip-172-31-13-41.us-west-2.compute.internal:8188/ws/v1/timeline/
2020-01-25 05:52:41,356 INFO org.apache.hadoop.yarn.client.RMProxy (main): Connecting to ResourceManager at ip-172-31-13-41.us-west-2.compute.internal/172.31.13.41:8032

然后它只是坐在那里......永远不会取得进展，必须关闭集群或手动终止任务。

有趣的是，运行 hadoop jar <arguments>，我可以在本地重现这一步，但我不知道是什么原因造成的。

大约 25 分钟后，解压 jar 失败：

After 25 minutes or so, the job produces output of the form:


AM Container for appattempt_1580058321574_0005_000001 exited with exitCode: -1000
For more detailed output, check application tracking page:http://192.168.2.21:8088/cluster/app/application_1580058321574_0005Then, click on links to logs of each attempt.
Diagnostics: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
java.io.FileNotFoundException: /Users/gbronner/hadoopdata/yarn/local/usercache/gbronner/appcache/application_1580058321574_0005/filecache/11_tmp/tmp_job.jar (Is a directory)
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.<init>(ZipFile.java:225)
at java.util.zip.ZipFile.<init>(ZipFile.java:155)
at java.util.jar.JarFile.<init>(JarFile.java:166)
at java.util.jar.JarFile.<init>(JarFile.java:130)
at org.apache.hadoop.util.RunJar.unJar(RunJar.java:94)
at org.apache.hadoop.yarn.util.FSDownload.unpack(FSDownload.java:297)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:364)
at org.apache.hadoop.yarn.util.FSDownload.call(FSDownload.java:62)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Failing this attempt

这在 AWS EMR 和本地都会发生。从未见过此错误，直接使用 EMR。

关于为什么会发生这种情况有什么想法吗？坏罐子？可能与 another unanswered question here

有关

Answer 1

在详尽地尝试了数百次实验之后，违规行似乎是

job.setJar().

为什么，我不知道。它在 intellij 下工作正常，但在本地和 intellij 下使用 hadoop 命令可靠地崩溃。

EMR 上的 MapReduce 不联系 RMProxy 并卡住等待资源管理器？

MapReduce on EMR does not contact RMProxy and gets stuck waiting for resourcemanager?

java

hadoop

mapreduce

maven

amazon-emr