Spark 抱怨 AWS EMR 中缺少 java.library.path
Spark complaining java.library.path missing in AWS EMR
我们有一个 AWS EMR 集群,我们在其中 运行 一些 spark 作业。作业是从 EC2 实例的 docker 容器提交的。
所有容器 运行ning spark 作业投诉低于错误。我试过在 spark-env 和 yarn-env 中添加 LD_LIBRARY_PATH,但错误仍然出现。由于这个问题,我们无法读取任何 CSV。
2022-03-31 11:57:48,605 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
at java.lang.Runtime.loadLibrary0(Runtime.java:871)
at java.lang.System.loadLibrary(System.java:1124)
at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2574)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2539)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.spark.sql.execution.datasources.CodecStreams$.$anonfun$getCompressionCodec(CodecStreams.scala:68)
at scala.Option.flatMap(Option.scala:271)
at org.apache.spark.sql.execution.datasources.CodecStreams$.getCompressionCodec(CodecStreams.scala:67)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:83)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CsvOutputWriter.scala:38)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon.newInstance(CSVFileFormat.scala:84)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:111)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:264)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write(FileFormatWriter.scala:205)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
你们谁能帮忙吗?
添加后得到这个工作:
LD_LIBRARY_PATH=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
LD_PRELOAD=/lib64/librt.so.1
new SparkConf().setExecutorEnv("LD_LIBRARY_PATH", LD_LIBRARY_PATH)
.setExecutorEnv("LD_PRELOAD", LD_PRELOAD)
从 core-site 中删除 属性 以下。xml:
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
https://knowledge.informatica.com/s/article/577809?language=en_US
我们有一个 AWS EMR 集群,我们在其中 运行 一些 spark 作业。作业是从 EC2 实例的 docker 容器提交的。 所有容器 运行ning spark 作业投诉低于错误。我试过在 spark-env 和 yarn-env 中添加 LD_LIBRARY_PATH,但错误仍然出现。由于这个问题,我们无法读取任何 CSV。
2022-03-31 11:57:48,605 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
at java.lang.Runtime.loadLibrary0(Runtime.java:871)
at java.lang.System.loadLibrary(System.java:1124)
at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2574)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2539)
at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
at org.apache.spark.sql.execution.datasources.CodecStreams$.$anonfun$getCompressionCodec(CodecStreams.scala:68)
at scala.Option.flatMap(Option.scala:271)
at org.apache.spark.sql.execution.datasources.CodecStreams$.getCompressionCodec(CodecStreams.scala:67)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:83)
at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CsvOutputWriter.scala:38)
at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon.newInstance(CSVFileFormat.scala:84)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:111)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:264)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write(FileFormatWriter.scala:205)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:127)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:446)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
你们谁能帮忙吗?
添加后得到这个工作:
LD_LIBRARY_PATH=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native
LD_PRELOAD=/lib64/librt.so.1
new SparkConf().setExecutorEnv("LD_LIBRARY_PATH", LD_LIBRARY_PATH)
.setExecutorEnv("LD_PRELOAD", LD_PRELOAD)
从 core-site 中删除 属性 以下。xml:
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
https://knowledge.informatica.com/s/article/577809?language=en_US