Spark 抱怨 AWS EMR 中缺少 java.library.path

Spark complaining java.library.path missing in AWS EMR

我们有一个 AWS EMR 集群,我们在其中 运行 一些 spark 作业。作业是从 EC2 实例的 docker 容器提交的。 所有容器 运行ning spark 作业投诉低于错误。我试过在 spark-env 和 yarn-env 中添加 LD_LIBRARY_PATH,但错误仍然出现。由于这个问题,我们无法读取任何 CSV。

2022-03-31 11:57:48,605 ERROR lzo.GPLNativeCodeLoader: Could not load native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
    at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1860)
    at java.lang.Runtime.loadLibrary0(Runtime.java:871)
    at java.lang.System.loadLibrary(System.java:1124)
    at com.hadoop.compression.lzo.GPLNativeCodeLoader.<clinit>(GPLNativeCodeLoader.java:32)
    at com.hadoop.compression.lzo.LzoCodec.<clinit>(LzoCodec.java:71)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:2574)
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2539)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.getCodecClasses(CompressionCodecFactory.java:132)
    at org.apache.hadoop.io.compress.CompressionCodecFactory.<init>(CompressionCodecFactory.java:180)
    at org.apache.spark.sql.execution.datasources.CodecStreams$.$anonfun$getCompressionCodec(CodecStreams.scala:68)
    at scala.Option.flatMap(Option.scala:271)
    at org.apache.spark.sql.execution.datasources.CodecStreams$.getCompressionCodec(CodecStreams.scala:67)
    at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream(CodecStreams.scala:83)
    at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92)
    at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CsvOutputWriter.scala:38)
    at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon.newInstance(CSVFileFormat.scala:84)
    at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.newOutputWriter(FileFormatDataWriter.scala:126)
    at org.apache.spark.sql.execution.datasources.SingleDirectoryDataWriter.<init>(FileFormatDataWriter.scala:111)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:264)
    at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write(FileFormatWriter.scala:205)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:127)
    at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run(Executor.scala:446)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1377)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:449)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:750)

你们谁能帮忙吗?

添加后得到这个工作:

LD_LIBRARY_PATH=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native

LD_PRELOAD=/lib64/librt.so.1

    new SparkConf().setExecutorEnv("LD_LIBRARY_PATH", LD_LIBRARY_PATH)
                   .setExecutorEnv("LD_PRELOAD", LD_PRELOAD)

从 core-site 中删除 属性 以下。xml:

<property>
    <name>io.compression.codec.lzo.class</name>
    <value>com.hadoop.compression.lzo.LzoCodec</value>
  </property>

https://knowledge.informatica.com/s/article/577809?language=en_US