当我 运行 通过 maven shade 插件制作的 uber jar 时,为什么 Spark Submit 会导致 NoSuchMethodError?

Why Spark Submit causes NoSuchMethodError when I run a uber jar made though maven shade plugin?

我有一个 Apache Beam 项目,如果我直接 运行 它就可以正常工作。但是,如果我尝试使用 maven clean:package 创建一个 jar,它会使用 maven shade 插件创建一个超级 jar。

之后我提供这个 uber jar 来进行 spark-submit,当我 运行 spark-submit 命令时它抛出异常

Exception in thread "main" java.lang.NoSuchMethodError: com.amazonaws.util.StringUtils.trim(Ljava/lang/String;)Ljava/lang/String;
    at com.amazonaws.auth.profile.internal.AwsProfileNameLoader.getEnvProfileName(AwsProfileNameLoader.java:72)
    at com.amazonaws.auth.profile.internal.AwsProfileNameLoader.loadProfileName(AwsProfileNameLoader.java:54)
    at com.amazonaws.regions.AwsProfileRegionProvider.<init>(AwsProfileRegionProvider.java:40)
    at com.amazonaws.regions.DefaultAwsRegionProviderChain.<init>(DefaultAwsRegionProviderChain.java:23)
    at com.amazonaws.client.builder.AwsClientBuilder.<clinit>(AwsClientBuilder.java:60)
    at org.apache.beam.sdk.io.aws.s3.DefaultS3ClientBuilderFactory.createBuilder(DefaultS3ClientBuilderFactory.java:39)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystemConfiguration.getBuilder(S3FileSystemConfiguration.java:102)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystemConfiguration.fromS3Options(S3FileSystemConfiguration.java:94)
    at org.apache.beam.sdk.io.aws.s3.DefaultS3FileSystemSchemeRegistrar.fromOptions(DefaultS3FileSystemSchemeRegistrar.java:39)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar.lambda$fromOptions[=11=](S3FileSystemRegistrar.java:52)
    at java.util.stream.ReferencePipeline.accept(ReferencePipeline.java:269)
    at java.util.Iterator.forEachRemaining(Iterator.java:116)
    at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482)
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472)
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566)
    at org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar.fromOptions(S3FileSystemRegistrar.java:55)
    at org.apache.beam.sdk.io.FileSystems.verifySchemesAreUnique(FileSystems.java:550)
    at org.apache.beam.sdk.io.FileSystems.setDefaultPipelineOptions(FileSystems.java:540)
    at org.apache.beam.sdk.PipelineRunner.fromOptions(PipelineRunner.java:47)
    at org.apache.beam.sdk.Pipeline.create(Pipeline.java:155)
    at org.propellyr.Main.main(Main.java:128)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

pom.xml 文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.propellyr</groupId>
    <artifactId>beam-etl</artifactId>
    <version>1.0-SNAPSHOT</version>

    <properties>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <beam.version>2.38.0</beam.version>

        <bigquery.version>v2-rev20211129-1.32.1</bigquery.version>
        <google-api-client.version>1.32.1</google-api-client.version>
        <guava.version>31.0.1-jre</guava.version>
        <hamcrest.version>2.1</hamcrest.version>
        <jackson.version>2.13.0</jackson.version>
        <joda.version>2.10.10</joda.version>

        <libraries-bom.version>24.4.0</libraries-bom.version>
        <maven-compiler-plugin.version>3.7.0</maven-compiler-plugin.version>
        <maven-exec-plugin.version>1.6.0</maven-exec-plugin.version>
        <maven-jar-plugin.version>3.0.2</maven-jar-plugin.version>
        <maven-shade-plugin.version>3.1.0</maven-shade-plugin.version>
        <mockito.version>3.7.7</mockito.version>
        <pubsub.version>v1-rev20211130-1.32.1</pubsub.version>
        <spark.version>2.4.6</spark.version>
        <hadoop.version>2.10.1</hadoop.version>
        <maven-surefire-plugin.version>3.0.0-M5</maven-surefire-plugin.version>
        <nemo.version>0.1</nemo.version>
        <flink.artifact.name>beam-runners-flink-1.14</flink.artifact.name>
    </properties>

    <repositories>
        <repository>
            <id>apache.snapshots</id>
            <name>Apache Development Snapshot Repository</name>
            <url>https://repository.apache.org/content/repositories/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

    <build>
        <plugins>
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>${maven-compiler-plugin.version}</version>
                <configuration>
                    <source>8</source>
                    <target>8</target>
                </configuration>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-surefire-plugin</artifactId>
                <version>${maven-surefire-plugin.version}</version>
                <configuration>
                    <parallel>all</parallel>
                    <threadCount>4</threadCount>
                    <redirectTestOutputToFile>true</redirectTestOutputToFile>
                </configuration>
            </plugin>

            <!-- Ensure that the Maven jar plugin runs before the Maven
              shade plugin by listing the plugin higher within the file. -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-jar-plugin</artifactId>
                <version>${maven-jar-plugin.version}</version>
            </plugin>

            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-shade-plugin</artifactId>
                <version>${maven-shade-plugin.version}</version>
                <configuration>
                    <artifactSet>
                        <excludes>
                            <exclude>META-INF.versions.11.module-info</exclude>
                        </excludes>
                    </artifactSet>
                    <createDependencyReducedPom>false</createDependencyReducedPom>
                    <filters>
                        <filter>
                            <artifact>*:*</artifact>
                            <excludes>
                                <exclude>module-info.class</exclude>
                                <exclude>META-INF/*.SF</exclude>
                                <exclude>META-INF/*.DSA</exclude>
                                <exclude>META-INF/*.RSA</exclude>
                            </excludes>
                        </filter>
                        <filter>

                        </filter>
                    </filters>
                </configuration>
                <executions>
                    <execution>
                        <phase>package</phase>
                        <goals>
                            <goal>shade</goal>
                        </goals>
                        <configuration>
                            <shadedArtifactAttached>true</shadedArtifactAttached>
                            <shadedClassifierName>shaded</shadedClassifierName>
                            <transformers>
                                <transformer
                                        implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"/>
                            </transformers>
                        </configuration>
                    </execution>
                </executions>
            </plugin>
        </plugins>

        <pluginManagement>
            <plugins>
                <plugin>
                    <groupId>org.codehaus.mojo</groupId>
                    <artifactId>exec-maven-plugin</artifactId>
                    <version>${maven-exec-plugin.version}</version>
                    <configuration>
                        <cleanupDaemonThreads>false</cleanupDaemonThreads>
                    </configuration>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>

    <profiles>
        <profile>
            <id>direct-runner</id>
            <activation>
                <activeByDefault>false</activeByDefault>
            </activation>
            <!-- Makes the DirectRunner available when running a pipeline. -->
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-direct-java</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                </dependency>
            </dependencies>
        </profile>

        <!--        <profile>-->
        <!--            <id>flink-runner</id>-->
        <!--            &lt;!&ndash; Makes the FlinkRunner available when running a pipeline. &ndash;&gt;-->
        <!--            <dependencies>-->
        <!--                <dependency>-->
        <!--                    <groupId>org.apache.beam</groupId>-->
        <!--                    &lt;!&ndash; Please see the Flink Runner page for an up-to-date list-->
        <!--                         of supported Flink versions and their artifact names:-->
        <!--                         https://beam.apache.org/documentation/runners/flink/ &ndash;&gt;-->
        <!--                    <artifactId>${flink.artifact.name}</artifactId>-->
        <!--                    <version>${beam.version}</version>-->
        <!--                    <scope>runtime</scope>-->
        <!--                </dependency>-->
        <!--            </dependencies>-->
        <!--        </profile>-->

        <profile>
            <id>spark-runner</id>
            <activation>
                <activeByDefault>true</activeByDefault>
            </activation>
            <!-- Makes the SparkRunner available when running a pipeline. Additionally,
                 overrides some Spark dependencies to Beam-compatible versions. -->
            <properties>
                <netty.version>4.1.17.Final</netty.version>
            </properties>
            <dependencies>
                <dependency>
                    <groupId>org.apache.beam</groupId>
                    <artifactId>beam-runners-spark</artifactId>
                    <version>${beam.version}</version>
                    <scope>runtime</scope>
                    <exclusions>
                        <exclusion>
                            <groupId>org.slf4j</groupId>
                            <artifactId>jul-to-slf4j</artifactId>
                        </exclusion>
                    </exclusions>
                </dependency>

                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-core_2.11</artifactId>
                    <version>${spark.version}</version>
                </dependency>
                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-hive_2.12</artifactId>
                    <version>${spark.version}</version>
                </dependency>

                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-avro_2.13</artifactId>
                    <version>3.2.1</version>
                </dependency>

                <dependency>
                    <groupId>org.apache.avro</groupId>
                    <artifactId>avro</artifactId>
                    <version>1.11.0</version>
                </dependency>

                <dependency>
                    <groupId>org.apache.spark</groupId>
                    <artifactId>spark-streaming_2.11</artifactId>
                    <version>${spark.version}</version>
                </dependency>
            </dependencies>
        </profile>
    </profiles>

    <dependencies>

        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-core</artifactId>
            <version>${beam.version}</version>
            <!--            <exclusions>-->
            <!--                <exclusion>-->
            <!--                    <groupId>org.apache.parquet</groupId>-->
            <!--                    <artifactId>parquet-column</artifactId>-->
            <!--                </exclusion>-->
            <!--            </exclusions>-->
        </dependency>

        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-runners-direct-java</artifactId>
            <version>${beam.version}</version>
        </dependency>

        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-parquet</artifactId>
            <version>2.38.0</version>
            <exclusions>
                <exclusion>
                    <groupId>commons-beanutils</groupId>
                    <artifactId>commons-beanutils-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>commons-beanutils</groupId>
                    <artifactId>commons-beanutils</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.google.guava</groupId>
                    <artifactId>guava</artifactId>
                </exclusion>
                <!--                <exclusion>-->
                <!--                    <groupId>org.apache.parquet</groupId>-->
                <!--                    <artifactId>parquet-avro</artifactId>-->
                <!--                </exclusion>-->
<!--                <exclusion>-->
<!--                    <groupId>org.apache.parquet</groupId>-->
<!--                    <artifactId>parquet-hadoop</artifactId>-->
<!--                </exclusion>-->
            </exclusions>
        </dependency>

        <!-- https://mvnrepository.com/artifact/org.slf4j/slf4j-reload4j -->
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>slf4j-reload4j</artifactId>
            <version>1.7.36</version>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.apache.parquet</groupId>
            <artifactId>parquet-hadoop</artifactId>
            <version>1.8.1</version>
        </dependency>


        <dependency>
            <groupId>tech.allegro.schema.json2avro</groupId>
            <artifactId>converter</artifactId>
            <version>0.2.13</version>
        </dependency>

        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-amazon-web-services</artifactId>
            <scope>compile</scope>
            <version>2.38.0</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/com.google.code.gson/gson -->
        <dependency>
            <groupId>com.google.code.gson</groupId>
            <artifactId>gson</artifactId>
            <version>2.9.0</version>
        </dependency>
        <!-- https://mvnrepository.com/artifact/com.fasterxml.jackson.core/jackson-core -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>2.13.2</version>
        </dependency>


    </dependencies>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>com.google.guava</groupId>
                <artifactId>guava</artifactId>
                <version>${guava.version}</version>  <!-- "-jre" for Java 8 or higher -->

            </dependency>
            <!-- GCP libraries BOM sets the version for google http client -->
            <dependency>
                <groupId>com.google.cloud</groupId>
                <artifactId>libraries-bom</artifactId>
                <version>${libraries-bom.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

</project>

我想知道如何解决这些类型的依赖项解析错误。我尝试使用 exclude 和 include maven 标签来尝试修复这些错误。 试图找到使用 mvn dependency:tree 是否存在任何重叠的依赖关系,但仍然没有解决同样的错误。

如@Gregoire 所述,问题是由于 Spark 环境中的 jar 文件引起的,因为来自 Spark jar 的 classes 总是在来自阴影的 class 之前加载罐。您可以从 Spark UI.

检查加载的 jar

Spark JAR 文件让您可以将一个项目打包成一个文件,这样它就可以 运行 在 Spark 集群上。

Shading 是 uber JAR 想法的扩展,通常仅限于使用情况。 JAR 是一个在另一个 application/library.

中使用的库