本机 snappy 库不可用:此版本的 libhadoop 是在没有 snappy 支持的情况下构建的
Native snappy library not available: this version of libhadoop was built without snappy support
我在使用MLUtils saveAsLibSVMFile 时出现了上述错误。尝试了如下各种方法,但没有任何效果。
/*
conf.set("spark.io.compression.codec","org.apache.spark.io.LZFCompressionCodec")
*/
/*
conf.set("spark.executor.extraClassPath","/usr/hdp/current/hadoop-client/lib/snappy-java-*.jar")
conf.set("spark.driver.extraClassPath","/usr/hdp/current/hadoop-client/lib/snappy-java-*.jar")
conf.set("spark.executor.extraLibraryPath","/usr/hdp/2.3.4.0-3485/hadoop/lib/native")
conf.set("spark.driver.extraLibraryPath","/usr/hdp/2.3.4.0-3485/hadoop/lib/native")
*/
最后只有两种方法可以解决。这在下面的答案中给出。
一种方法是使用不同的 hadoop 编解码器,如下所示
sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress", "true")
sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString)
sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec")
sc.hadoopConfiguration.set("mapreduce.map.output.compress", "true")
sc.hadoopConfiguration.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec")
第二种方法是提及 --driver-library-path /usr/hdp/<whatever is your current version>/hadoop/lib/native/
作为我的 spark-submit 作业的参数(在命令行中)
我在使用MLUtils saveAsLibSVMFile 时出现了上述错误。尝试了如下各种方法,但没有任何效果。
/*
conf.set("spark.io.compression.codec","org.apache.spark.io.LZFCompressionCodec")
*/
/*
conf.set("spark.executor.extraClassPath","/usr/hdp/current/hadoop-client/lib/snappy-java-*.jar")
conf.set("spark.driver.extraClassPath","/usr/hdp/current/hadoop-client/lib/snappy-java-*.jar")
conf.set("spark.executor.extraLibraryPath","/usr/hdp/2.3.4.0-3485/hadoop/lib/native")
conf.set("spark.driver.extraLibraryPath","/usr/hdp/2.3.4.0-3485/hadoop/lib/native")
*/
最后只有两种方法可以解决。这在下面的答案中给出。
一种方法是使用不同的 hadoop 编解码器,如下所示
sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress", "true") sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.type", CompressionType.BLOCK.toString) sc.hadoopConfiguration.set("mapreduce.output.fileoutputformat.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec") sc.hadoopConfiguration.set("mapreduce.map.output.compress", "true") sc.hadoopConfiguration.set("mapreduce.map.output.compress.codec", "org.apache.hadoop.io.compress.BZip2Codec")
第二种方法是提及 --driver-library-path
/usr/hdp/<whatever is your current version>/hadoop/lib/native/
作为我的 spark-submit 作业的参数(在命令行中)