通过 Bazel 使用 Spark 库的 ClassNotFoundException

ClassNotFoundException using Spark library through Bazel

我正在尝试 运行 在 Spark 中使用 Bazel 构建一个 "hello world" 服务器,但我收到此错误:

$ bazel run //:app
INFO: Analysed target //:app (0 packages loaded).
INFO: Found 1 target...
Target //:app up-to-date:
  bazel-bin/app.jar
  bazel-bin/app
INFO: Elapsed time: 0.201s, Critical Path: 0.00s
INFO: 0 processes.
INFO: Build completed successfully, 1 total action
INFO: Build completed successfully, 1 total action
Exception in thread "main" java.lang.NoClassDefFoundError: org/slf4j/LoggerFactory
        at spark.Service.<clinit>(Service.java:56)
        at spark.Spark$SingletonHolder.<clinit>(Spark.java:51)
        at spark.Spark.getInstance(Spark.java:55)
        at spark.Spark.<clinit>(Spark.java:61)
        at io.app.server.Main.main(Main.java:7)
Caused by: java.lang.ClassNotFoundException: org.slf4j.LoggerFactory
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 5 more

构建:

java_binary(
    name = "app",
    main_class = "io.app.server.Main",
    srcs = ["src/main/java/io/app/server/Main.java"],
    deps = [
        "@org_slf4j_slf4j_simple//jar",
        "@com_sparkjava_spark_core//jar",
    ]
)

如果我不包含 slf4j 也会出现同样的错误,它不应该是 spark 的必需依赖项。

工作空间:

maven_jar(
    name = "com_sparkjava_spark_core",
    artifact = "com.sparkjava:spark-core:2.7.2"
)

maven_jar(
    name = "org_slf4j_slf4j_simple",
    artifact = "org.slf4j:slf4j-simple:1.7.21"
)

最后,src/main/java/io/app/server/Main.java:

package io.app.server;

import static spark.Spark.*;

public class Main {
  public static void main(String[] args) {
    port(3000);
    get("/", (req, res) -> "Hello World");
  }
}

知道我在这里做错了什么吗?

找到我丢失的东西了。似乎 maven_jar 不会自动获取库本身具有的 "transitive dependencies",see this.

Bazel only reads dependencies listed in your WORKSPACE file. If your project (A) depends on another project (B) which list a dependency on a third project (C) in its WORKSPACE file, you'll have to add both B and C to your project's WORKSPACE file. This requirement can balloon the WORKSPACE file size, but hopefully limits the chances of having one library include C at version 1.0 and another include C at 2.0.

Large WORKSPACE files can be generated using the tool generate_workspace. For details, see Generate external dependencies from Maven projects.

所以解决方案似乎是写一个pom.xml并使用generate_workspace

编辑:generate_workspace 似乎已被弃用,请改用 bazel_deps

另一个解决方案可能是使用 maven_install

git_repository(
    name = "rules_jvm_external",
    commit = "22b463c485f31b240888c89d17e67c460d7e68c0",
    remote = "https://github.com/bazelbuild/rules_jvm_external.git",
)
load("@rules_jvm_external//:defs.bzl", "maven_install")
maven_install(
    artifacts = [
        "org.apache.spark:spark-core_2.12:3.1.2",
        "org.apache.spark:spark-sql_2.12:3.1.2",
    ],
    repositories = [
        "https://repo.maven.apache.org/maven2/",
    ]
)