如何使用 Hadoop 配置文件在 Windows 上使用 SBT 构建 Spark 1.6.1？

Question

如何在 Windows (8-10) 上使用 SBT 构建 Spark 时激活 Hadoop 和 YARN 配置文件？

>sbt package

以上代码有效，但无法激活以下配置文件：

>sbt -Pyarn package

我在问，因为 mvn 与 SBT 相比特别慢。我有使用 SBT 和 Maven 在 Linux 上构建 Spark 的经验。

Answer 1

您必须使用与 Spark 源代码分发捆绑在一起的 ./build/sbt 脚本。它调用另一个脚本 sbt-launch-lib.bash 执行一些与配置文件相关的魔术：

enableProfile () {
  dlog "[enableProfile] arg = ''"
  maven_profiles=( "${maven_profiles[@]}" "" )
  export SBT_MAVEN_PROFILES="${maven_profiles[@]}"
}

另一方面，项目定义 SparkBuild 扩展 PomBuild，允许使用 Maven 项目（包括配置文件）：

override val profiles = {                                                                                                              
  val profiles = Properties.envOrNone("SBT_MAVEN_PROFILES") match {                                                                    
    ...
  }                                                                                                                             
  profiles                                                                                                                             
}

所以如果你运行它应该可以工作（使用 Cygwin）：

sh build/sbt -Pyarn package

然而，由于 sbt-launch-lib.bash 的路径发现不正确，它对我来说开箱即用。所以我在 build\sbt 中替换了一行：

. "$(dirname "$(realpath "[=13=]")")"/sbt-launch-lib.bash

到

. "$(dirname "$(realpath "[=14=]")")"/build/sbt-launch-lib.bash

如何使用 Hadoop 配置文件在 Windows 上使用 SBT 构建 Spark 1.6.1？

How to build Spark 1.6.1 with SBT on Windows using Hadoop profiles?

build

profiles

sbt

apache-spark