如何获取集群信息以调用 REST API(来自驱动程序)?

How to get cluster information to call REST API (from the driver)?

我想使用 Spark REST API 获取指标并发布到云观察。但是 RESR API 就像:

 val url = "http://<host>:4040/api/v1/applications/<app-name>/stages"

如果我提供主控主机和应用程序 ID,它就可以工作,但我如何在工作中使用它并动态计算我们的主控主机和应用程序名称?有什么方法可以获取这些信息吗?

使用 Spark 2.1

尝试过:

导入org.apache.spark.sql.SparkSession

有效 ID = spark.sparkContext.applicationId val url = spark.sparkContext.uiWebUrl.get

  case class SparkStage(name: String, shuffleWriteBytes: Long, memoryBytesSpilled: Long, diskBytesSpilled: Long)
val path = url + "/api/v1/applications/" + id  + "/stages"

implicit val formats = DefaultFormats
val json = fromURL(path).mkString
val stages: List[SparkStage] = parse(json).extract[List[SparkStage]]

我得到:

java.io.IOException: Server returned HTTP response code: 500 for URL: http://112.21.2.151:4040/api/v1/applications/application_1515337161733_0001
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1876)
  at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1474)
  at java.net.URL.openStream(URL.java:1045)
  at scala.io.Source$.fromURL(Source.scala:141)
  at scala.io.Source$.fromURL(Source.scala:131)
  ... 64 elided

如果您知道主机,您可以查询 applications 端点:

http://localhost:4040/api/v1/applications

并解析结果以获取应用程序 ID。

要从应用程序中获取 applicationIdhost,请使用各自的 SparkContext 方法:

val spark: SparkSession

spark.sparkContext.applicationId
spark.sparkContext.uiWebUrl