如何通过 REST API 提交作业?

How to submit a job via REST API?

我正在使用 Datastax Enterprise 4.8.3。 我正在尝试实现一个基于 Quartz 的应用程序来远程提交 Spark 作业。 在我的研究过程中,我偶然发现了以下 links:

  1. Apache Spark Hidden REST API
  2. Spark feature - Provide a stable application submission gateway in standalone cluster mode

为了验证理论,我尝试在我的 2 节点集群(如 link上面#1):

curl -X POST http://spark-master-ip:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
    "action" : "CreateSubmissionRequest",
    "appArgs" : [ "myAppArgument1" ],
    "appResource" : "file:/home/local/sparkjob.jar",
    "clientSparkVersion" : "1.4.2",
    "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass" : "com.spark.job.Launcher",
  "sparkProperties" : {
      "spark.jars" : "file:/home/local/sparkjob.jar",
      "spark.driver.supervise" : "false",
      "spark.app.name" : "MyJob",
      "spark.eventLog.enabled": "true",
      "spark.submit.deployMode" : "cluster",
      "spark.master" : "spark://spark-master-ip:6066"
  }
}'

但是执行代码我得到一个 html 响应,其中包含以下文本:

This Page Cannot Be Displayed
The system cannot communicate with the external server (spark-master-ip).
The Internet server may be busy, may be permanently down, or may be unreachable because of network problems.
Please check the spelling of the Internet address entered.
If it is correct, try this request later.

If you have questions, please contact your organization's network administrator and provide the codes shown below.

Date: Fri, 11 Dec 2015 13:19:15 GMT
Username: 
Source IP: spark-master-ip
URL: POST http://spark-master-ip/v1/submissions/create
Category: Uncategorized URLs
Reason: UNKNOWN
Notification: GATEWAY_TIMEOUT
  • 检查您是否启动了 Spark master 和 worker(使用 start-all.sh

  • 检查日志文件中是否有类似

  • 的消息

INFO rest.StandaloneRestServer: Started REST server for submitting applications on port 6066

  • 检查启动的进程是否真的在监听端口 6066(使用 netstat

它应该是这样的:

root@x:~# netstat -apn | grep 11572 | grep LISTEN
tcp6       0      0 :::8080                 :::*                    LISTEN      11572/java      
tcp6       0      0 10.0.0.9:6066           :::*                    LISTEN      11572/java      
tcp6       0      0 10.0.0.9:7077           :::*                    LISTEN      11572/java      

然后将脚本中的 "spark-master-ip" 替换为您在 netstat 的输出中看到的 IP 地址(示例显示“10.0.0.9”)。

使用 Spark 2.4.3,我们发现默认情况下 REST API 是禁用的。当 REST API 被禁用时,对端口 6066 的调用将失败并出现您所显示的错误。

我们发现必须通过将以下条目添加到您的 spark-defaults.conf 文件来启用 REST API。

spark.master.rest.enabled true

添加此条目后,我们在机器上重新启动了 Spark 实例,REST API 开始运行。