使用 curl 通过 livy 提交 spark 作业

Submitting spark Jobs over livy using curl

我正在通过 Curl 在 livy (0.6.0) 会话中提交 spark 作业

作业是一个大的 jar 文件,它扩展了作业接口,就像这样:

实际上当 运行 此代码使用此 curl 命令时:

curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/

当涉及到代码时,它与上面显示的答案完全一样:

package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._

object Test extends Job[Boolean]{
  override def call(jc: JobContext): Boolean = {
  val sc = jc.sc
  sc.getConf.getAll.foreach(println)
  return true
}

至于错误是java空指针异常,如下所示

Exception in thread "main" java.lang.NullPointerException
    at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
    at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
    at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
    at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
    at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:167)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:924)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

因为排除的输出是启动 运行 jar 中的作业

我使用了 livy REST apis,并且有两种方法可以提交 spark 作业。请参考 rest api docs,你会得到公平的理解 livy rest 请求。:

1。批次(/batches):
您提交请求,您将获得工作 ID。根据作业 ID,您可以轮询 Spark 作业的状态。在这里,您可以选择执行 uber jar 和代码文件,但我从未使用过后者

2。 Session (/sessions and /sessions/{sessionId}/statements):
您将 spark 作业作为代码提交,无需创建 uber jar。在这里,您首先创建一个 Session 并在此会话中执行 Statement/s(实际代码)

对于这两种方法,如果您查看文档,它对相应的休息请求和请求有很好的解释 body/parameters。

Examples/Samples 是 here, here

对您的代码的更正为:

批量

curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "file": "absolute-path-to-your-application-jar",
    "className": "fully-qualified-spark-class-name",
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/batches/

会话和声明

// Create a session
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "files": [
      "<use-absolute-path>"
    ],
    "driverMemory": "512M",
    "executorMemory": "512M",
    "conf": {<any-other-configs-as-key-val>}
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/

// Run code/statement in session created above
curl \
  -X POST \
  -d '{
    "kind": "spark",
    "code": "spark-code"
  }' \
  -H "Content-Type: application/json" \
  localhost:8998/sessions/{sessionId}/statements

正如@yegeniy 上面提到的问题来自LIVY-636 你将需要在没有 Scala 库的情况下构建 Jar,一切都会顺利进行