使用 gradle 为 spark 提交打包一个 spring 引导 jar

packaging a spring boot jar for spark submit using gradle

当我 运行 由 gradle 在 spark 上构建的 spring 引导打包 JAR 时,class 上遇到 ClassNotFoundException:

spark2-submit --class com.test.DriverMain test.jar ...

我正在使用 org.springframework.boot:spring-boot-gradle-plugin:2.2.0.RELEASE

我也尝试删除对主要 class 名称的 --class 引用,在 ClassNotFoundException 上传递但后来我 运行 进入了很多 ClassCastException 运行:

java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda to field org.apache.spark.sql.functions$$anonfun.f of type org.apache.spark.sql.api.java.UDF1 in instance of org.apache.spark.sql.functions$$anonfun

在 spark 文档 (https://spark.apache.org/docs/latest/submitting-applications.html) 上,它有 links for sbt 和 maven 插件关于如何打包兼容的 jar 以在 spark 上提交(但是有 none gradle).

spring-boot-gradle-plugin输出的jar结构如下:

test.jar
- BOOT-INF
  - lib
     - ... jars dependencies 
  - classes
     - com \ test \ ...
- org
  - springframework \ boot \loader \ ...
- META-INF
  - MANIFEST.MF

预期的结构类似于:

test.jar
- com \ test \ ...
- jar dependencies in package and classes format like org \ springframework \ data \ jpa \ ...

这是 gradle 的工作片段:

    buildscript {
        repositories repos
        dependencies {
            classpath com.github.jengelman.gradle.plugins:shadow:5.2.0
        }
    }
    ...
    apply plugin: 'com.github.johnrengelman.shadow'
    
    // this way I won't be needing any spring boot gradle related plugin for packaging and dependency mgt
    dependencies {
        implementation platform("org.springframework.boot:spring-boot-dependencies:2.2.0.RELEASE")
        ...
    }
    // set to false, else it will be packaged twice, one for the shaded jar, one with the normal jar
    jar {
        enabled = false
    }
    // creates the spring boot shaded jar
    import com.github.jengelman.gradle.plugins.shadow.transformers.PropertiesFileTransformer
    shadowJar {
        zip64 true
        mergeServiceFiles()
        append 'META-INF/spring.handlers'
        append 'META-INF/spring.schemas'
        append 'META-INF/spring.tooling'
        transform(PropertiesFileTransformer) {
            paths = ['META-INF/spring.factories' ]
            mergeStrategy = "append"
        }
        archiveFileName = "test-${version}.jar"
    }
    // shaded jar will be built whenever jar is being invoked
    jar.dependsOn(shadowJar)

参考link:https://github.com/spring-projects/spring-boot/issues/1828#issuecomment-231104288