随机播放空文件失败。 EOFException:输入流意外结束

Shuffle failed on empty file. EOFException: Unexpected end of input stream

我正在尝试 运行 数据处理管道的副本,它在集群上正常工作,在本地机器上,hadoop 和 hbase 在独立模式下工作。 管道包含几个 mapreduce 作业一个接一个地开始,其中一个作业具有不在输出中写入任何内容的映射器(取决于输入,但它在我的测试中不写入任何内容),但具有缩减器。 我在这项工作 运行ning:

期间收到此异常
16:42:19,322 [INFO] [localfetcher#13] o.a.h.i.c.CodecPool: Got brand-new decompressor [.gz] 
16:42:19,322 [INFO] [localfetcher#13] o.a.h.m.t.r.LocalFetcher: localfetcher#13 about to shuffle output of map attempt_local509755465_0013_m_000000_0 decomp: 2 len: 6 to MEMORY
16:42:19,326 [WARN] [Thread-4749] o.a.h.m.LocalJobRunner: job_local509755465_0013 java.lang.Exception: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
  at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
  at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:529) [hadoop-mapreduce-client-common-2.5.1.jar:?]
Caused by: org.apache.hadoop.mapreduce.task.reduce.Shuffle$ShuffleError: error in shuffle in localfetcher#13
  at org.apache.hadoop.mapreduce.task.reduce.Shuffle.run(Shuffle.java:134) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:376) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:319) ~[hadoop-mapreduce-client-common-2.5.1.jar:?]
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_181]
  at java.util.concurrent.FutureTask.run$$$capture(FutureTask.java:266) ~[?:1.8.0_181]
  at java.util.concurrent.FutureTask.run(FutureTask.java) ~[?:1.8.0_181]
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_181]
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_181]
  at java.lang.Thread.run(Thread.java:748) ~[?:1.8.0_181]
Caused by: java.io.EOFException: Unexpected end of input stream
  at org.apache.hadoop.io.compress.DecompressorStream.decompress(DecompressorStream.java:145) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:85) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:199) ~[hadoop-common-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.InMemoryMapOutput.shuffle(InMemoryMapOutput.java:97) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.copyMapOutput(LocalFetcher.java:157) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.doCopy(LocalFetcher.java:102) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]
  at org.apache.hadoop.mapreduce.task.reduce.LocalFetcher.run(LocalFetcher.java:85) ~[hadoop-mapreduce-client-core-2.7.3.jar:?]

我检查了 mapper 生成的文件,我预计它们会是空的,因为 mapper 不会写入任何要存储的内容,但它们包含奇怪的文本:

文件:/tmp/hadoop-egorkiruhin/mapred/local/localRunner/egorkiruhin/jobcache/job_local509755465_0013/attempt_local509755465_0013_m_000000_0/output/file.out

ÿÿÿÿ^@^@

文件:/tmp/hadoop-egorkiruhin/mapred/local/localRunner/egorkiruhin/jobcache/job_local509755465_0013/attempt_local509755465_0013_m_000000_0/output/file.out.index

^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^B^@^@^@^@^@^@^@^F^@^@^@^@dTG<93>

我找不到这个问题的解释,但我通过关闭映射器输出的压缩解决了这个问题:

config.set("mapreduce.map.output.compress", "false");