无法将 SnappyCodec 与 hadoop jar 一起使用:NullPointerException
Unable to use SnappyCodec with hadoop jar: NullPointerException
我正在尝试在一个简单的 java 程序中使用 Hadoop 的压缩库。但是我无法使用 Snappy 编解码器:执行在方法 SnappyCodec.createCompressor
.
中产生 NullPointerException
请注意,我没有得到典型的 java.lang.UnsatisfiedLinkError
,这是由于未设置 LD_LIBRARY_PATH 和 JAVA_LIBRARY_PATH 环境变量而导致的。 Snappy 与 CDH 一起正确安装,运行 hadoop checknative
报告可用,当我对一个 snappy 文件执行 hdfs dfs -text
时,Snappy 解压工作。
$ hadoop jar SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
$
$ hadoop checknative | grep snappy 2>/dev/null
$ snappy: true /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/libsnappy.so.1
$ ls /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
libhadoop.a libhadoop.so.1.0.0 libnativetask.a libsnappy.so
libhadooppipes.a libhadooputils.a libnativetask.so libsnappy.so.1
libhadoop.so libhdfs.a libnativetask.so.1.0.0
libsnappy.so.1.1.4
$ export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
$ java -Djava.library.path=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/ -cp `hadoop classpath`:SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)
Java 代码看起来像这样,最后一行是罪魁祸首:
SnappyCodec.checkNativeCodeLoaded();
CompressionCodec codec = new SnappyCodec();
Compressor comp = CodecPool.getCompressor(codec);
我错过了什么?
好的,问题原来是 CompressionCodec
需要正确的配置,正如 this answer 中指出的那样。
获取已配置的 Snappy 压缩器的简单方法如下:
Configuration conf = new Configuration();
CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
CompressionCodec codec = ccf.getCodecByClassName(SnappyCodec.class.getName());
Compressor comp = codec.createCompressor();
生成的 jar 可以 运行 包含原始问题中使用的命令行。
我正在尝试在一个简单的 java 程序中使用 Hadoop 的压缩库。但是我无法使用 Snappy 编解码器:执行在方法 SnappyCodec.createCompressor
.
NullPointerException
请注意,我没有得到典型的 java.lang.UnsatisfiedLinkError
,这是由于未设置 LD_LIBRARY_PATH 和 JAVA_LIBRARY_PATH 环境变量而导致的。 Snappy 与 CDH 一起正确安装,运行 hadoop checknative
报告可用,当我对一个 snappy 文件执行 hdfs dfs -text
时,Snappy 解压工作。
$ hadoop jar SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
$
$ hadoop checknative | grep snappy 2>/dev/null
$ snappy: true /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/libsnappy.so.1
$ ls /opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
libhadoop.a libhadoop.so.1.0.0 libnativetask.a libsnappy.so
libhadooppipes.a libhadooputils.a libnativetask.so libsnappy.so.1
libhadoop.so libhdfs.a libnativetask.so.1.0.0
libsnappy.so.1.1.4
$ export LD_LIBRARY_PATH=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/
$ java -Djava.library.path=/opt/cloudera/parcels/CDH-5.5.1-1.cdh5.5.1.p0.11/lib/hadoop/lib/native/ -cp `hadoop classpath`:SnappyTool-0.0.1-SNAPSHOT.jar com.mycorp.SnappyCompressor
Exception in thread "main" java.lang.NullPointerException
at org.apache.hadoop.io.compress.SnappyCodec.createCompressor(SnappyCodec.java:145)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152)
at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:165)
at com.mycorp.SnappyCompressor.main(SnappyCompressor.java:19)
Java 代码看起来像这样,最后一行是罪魁祸首:
SnappyCodec.checkNativeCodeLoaded();
CompressionCodec codec = new SnappyCodec();
Compressor comp = CodecPool.getCompressor(codec);
我错过了什么?
好的,问题原来是 CompressionCodec
需要正确的配置,正如 this answer 中指出的那样。
获取已配置的 Snappy 压缩器的简单方法如下:
Configuration conf = new Configuration();
CompressionCodecFactory ccf = new CompressionCodecFactory(conf);
CompressionCodec codec = ccf.getCodecByClassName(SnappyCodec.class.getName());
Compressor comp = codec.createCompressor();
生成的 jar 可以 运行 包含原始问题中使用的命令行。