Hadoop 流 "GC overhead limit exceeded"
Hadoop streaming "GC overhead limit exceeded"
我是运行这个命令:
hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"
其中 <input dir>
是一个包含许多 avro
个文件的目录。
出现此错误:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead
limit exceeded at
org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287)
at
org.apache.hadoop.hdfs.protocol.DatanodeID.(DatanodeID.java:91)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:136)
at
org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:122)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524)
at
org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557)
at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601) at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy15.getListing(Unknown Source) at
org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969) at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888)
at
org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:863)
at
org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267)
at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228)
at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624)
at
org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616)
at
org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492)
at org.apache.hadoop.mapreduce.Job.run(Job.java:1296) at
org.apache.hadoop.mapreduce.Job.run(Job.java:1293) at
java.security.AccessController.doPrivileged(Native Method) at
javax.security.auth.Subject.doAs(Subject.java:415)
如何解决这个问题?
花了一些时间,但我找到了解决方案 here。
在命令前添加 HADOOP_CLIENT_OPTS="-Xmx1024M"
解决了问题。
最后的命令行是:
HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"
我是运行这个命令:
hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"
其中 <input dir>
是一个包含许多 avro
个文件的目录。
出现此错误:
Exception in thread "main" java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.hadoop.hdfs.protocol.DatanodeID.updateXferAddrAndInvalidateHashCode(DatanodeID.java:287) at org.apache.hadoop.hdfs.protocol.DatanodeID.(DatanodeID.java:91) at org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:136) at org.apache.hadoop.hdfs.protocol.DatanodeInfo.(DatanodeInfo.java:122) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:633) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:793) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convertLocatedBlock(PBHelper.java:1252) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1270) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1413) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1524) at org.apache.hadoop.hdfs.protocolPB.PBHelper.convert(PBHelper.java:1533) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getListing(ClientNamenodeProtocolTranslatorPB.java:557) at sun.reflect.GeneratedMethodAccessor3.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:601) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.getListing(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:1969) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNextNoFilter(DistributedFileSystem.java:888) at org.apache.hadoop.hdfs.DistributedFileSystem$DirListingIterator.hasNext(DistributedFileSystem.java:863) at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:267) at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:228) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:313) at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:624) at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:616) at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:492) at org.apache.hadoop.mapreduce.Job.run(Job.java:1296) at org.apache.hadoop.mapreduce.Job.run(Job.java:1293) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415)
如何解决这个问题?
花了一些时间,但我找到了解决方案 here。
在命令前添加 HADOOP_CLIENT_OPTS="-Xmx1024M"
解决了问题。
最后的命令行是:
HADOOP_CLIENT_OPTS="-Xmx1024M" hadoop jar hadoop-streaming.jar -D stream.tmpdir=/tmp -input "<input dir>" -output "<output dir>" -mapper "grep 20151026" -reducer "wc -l"