使用 curl 通过 livy 提交 spark 作业
Submitting spark Jobs over livy using curl
我正在通过 Curl 在 livy (0.6.0) 会话中提交 spark 作业
作业是一个大的 jar 文件,它扩展了作业接口,就像这样:
实际上当 运行 此代码使用此 curl 命令时:
curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/
当涉及到代码时,它与上面显示的答案完全一样:
package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._
object Test extends Job[Boolean]{
override def call(jc: JobContext): Boolean = {
val sc = jc.sc
sc.getConf.getAll.foreach(println)
return true
}
至于错误是java空指针异常,如下所示
Exception in thread "main" java.lang.NullPointerException
at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
因为排除的输出是启动 运行 jar 中的作业
我使用了 livy REST apis,并且有两种方法可以提交 spark 作业。请参考 rest api docs,你会得到公平的理解 livy rest 请求。:
1。批次(/batches):
您提交请求,您将获得工作 ID。根据作业 ID,您可以轮询 Spark 作业的状态。在这里,您可以选择执行 uber jar 和代码文件,但我从未使用过后者
2。 Session (/sessions and /sessions/{sessionId}/statements):
您将 spark 作业作为代码提交,无需创建 uber jar。在这里,您首先创建一个 Session 并在此会话中执行 Statement/s(实际代码)
对于这两种方法,如果您查看文档,它对相应的休息请求和请求有很好的解释 body/parameters。
对您的代码的更正为:
批量
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"file": "absolute-path-to-your-application-jar",
"className": "fully-qualified-spark-class-name",
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/batches/
会话和声明
// Create a session
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/
// Run code/statement in session created above
curl \
-X POST \
-d '{
"kind": "spark",
"code": "spark-code"
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/{sessionId}/statements
正如@yegeniy 上面提到的问题来自LIVY-636 你将需要在没有 Scala 库的情况下构建 Jar,一切都会顺利进行
我正在通过 Curl 在 livy (0.6.0) 会话中提交 spark 作业
作业是一个大的 jar 文件,它扩展了作业接口,就像这样:
实际上当 运行 此代码使用此 curl 命令时:
curl -X POST -d '{"kind": "spark","files":["/config.json"],"jars":["/myjar.jar"],"driverMemory":"512M","executorMemory":"512M"}' -H "Content-Type: application/json" localhost:8998/sessions/
当涉及到代码时,它与上面显示的答案完全一样:
package com.mycompany.test
import org.apache.livy.{Job, JobContext}
import org.apache.spark._
import org.apache.livy.scalaapi._
object Test extends Job[Boolean]{
override def call(jc: JobContext): Boolean = {
val sc = jc.sc
sc.getConf.getAll.foreach(println)
return true
}
至于错误是java空指针异常,如下所示
Exception in thread "main" java.lang.NullPointerException
at org.apache.livy.rsc.driver.JobWrapper.cancel(JobWrapper.java:90)
at org.apache.livy.rsc.driver.RSCDriver.shutdown(RSCDriver.java:127)
at org.apache.livy.rsc.driver.RSCDriver.run(RSCDriver.java:356)
at org.apache.livy.rsc.driver.RSCDriverBootstrapper.main(RSCDriverBootstrapper.java:93)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:849)
at org.apache.spark.deploy.SparkSubmit.doRunMain(SparkSubmit.scala:167)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:195)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon.doSubmit(SparkSubmit.scala:924)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:933)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
因为排除的输出是启动 运行 jar 中的作业
我使用了 livy REST apis,并且有两种方法可以提交 spark 作业。请参考 rest api docs,你会得到公平的理解 livy rest 请求。:
1。批次(/batches):
您提交请求,您将获得工作 ID。根据作业 ID,您可以轮询 Spark 作业的状态。在这里,您可以选择执行 uber jar 和代码文件,但我从未使用过后者
2。 Session (/sessions and /sessions/{sessionId}/statements):
您将 spark 作业作为代码提交,无需创建 uber jar。在这里,您首先创建一个 Session 并在此会话中执行 Statement/s(实际代码)
对于这两种方法,如果您查看文档,它对相应的休息请求和请求有很好的解释 body/parameters。
对您的代码的更正为:
批量
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"file": "absolute-path-to-your-application-jar",
"className": "fully-qualified-spark-class-name",
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/batches/
会话和声明
// Create a session
curl \
-X POST \
-d '{
"kind": "spark",
"files": [
"<use-absolute-path>"
],
"driverMemory": "512M",
"executorMemory": "512M",
"conf": {<any-other-configs-as-key-val>}
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/
// Run code/statement in session created above
curl \
-X POST \
-d '{
"kind": "spark",
"code": "spark-code"
}' \
-H "Content-Type: application/json" \
localhost:8998/sessions/{sessionId}/statements
正如@yegeniy 上面提到的问题来自LIVY-636 你将需要在没有 Scala 库的情况下构建 Jar,一切都会顺利进行