使 spark 在 YARN 集群模式下使用 /etc/hosts 文件进行绑定
Making spark use /etc/hosts file for binding in YARN cluster mode
在一台有两个 inet 的机器上设置一个 spark 集群,一个 public 另一个是私有的。集群中的 /etc/hosts 文件具有集群中所有其他机器的内部 ip,就像这样。
internal_ip FQDN
然而,当我在 YARN 客户端模式下通过 pyspark 请求 SparkContext 时 (pyspark --master yarn --deploy-mode client
),akka 绑定到 public ip,因此发生超时。
15/11/07 23:29:23 INFO Remoting: Starting remoting
15/11/07 23:29:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkYarnAM@public_ip:44015]
15/11/07 23:29:23 INFO util.Utils: Successfully started service 'sparkYarnAM' on port 44015.
15/11/07 23:29:23 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Failed to connect to driver at yarn_driver_public_ip:48875, retrying ...
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:427)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:293)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:149)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main.apply$mcV$sp(ApplicationMaster.scala:574)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:65)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:572)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:599)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1446960366742_0002
从日志中可以看出,私有IP完全被忽略了,如何让YARN和spark使用hosts文件中指定的私有IP地址?
集群是使用 Ambari(HDP 2.4) 配置的
问题+1。
Spark 使用 Akka 进行通信。
所以它更像是一个 Akka 问题而不是 Spark。
If you need to bind your network interface to a different address -
use akka.remote.netty.tcp.bind-hostname and
akka.remote.netty.tcp.bind-port settings.
目前这是 spark 中的一个问题,让 spark 绑定到正确接口的唯一方法是使用自定义名称服务器。
Spark 本质上执行主机名查找并使用它找到的 IP 地址与 Akka 绑定。解决方法是创建自定义绑定区域和 运行 名称服务器。
在一台有两个 inet 的机器上设置一个 spark 集群,一个 public 另一个是私有的。集群中的 /etc/hosts 文件具有集群中所有其他机器的内部 ip,就像这样。
internal_ip FQDN
然而,当我在 YARN 客户端模式下通过 pyspark 请求 SparkContext 时 (pyspark --master yarn --deploy-mode client
),akka 绑定到 public ip,因此发生超时。
15/11/07 23:29:23 INFO Remoting: Starting remoting
15/11/07 23:29:23 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkYarnAM@public_ip:44015]
15/11/07 23:29:23 INFO util.Utils: Successfully started service 'sparkYarnAM' on port 44015.
15/11/07 23:29:23 INFO yarn.ApplicationMaster: Waiting for Spark driver to be reachable.
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Failed to connect to driver at yarn_driver_public_ip:48875, retrying ...
15/11/07 23:31:30 ERROR yarn.ApplicationMaster: Uncaught exception:
org.apache.spark.SparkException: Failed to connect to driver!
at org.apache.spark.deploy.yarn.ApplicationMaster.waitForSparkDriver(ApplicationMaster.scala:427)
at org.apache.spark.deploy.yarn.ApplicationMaster.runExecutorLauncher(ApplicationMaster.scala:293)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:149)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main.apply$mcV$sp(ApplicationMaster.scala:574)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:66)
at org.apache.spark.deploy.SparkHadoopUtil$$anon.run(SparkHadoopUtil.scala:65)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:572)
at org.apache.spark.deploy.yarn.ExecutorLauncher$.main(ApplicationMaster.scala:599)
at org.apache.spark.deploy.yarn.ExecutorLauncher.main(ApplicationMaster.scala)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: org.apache.spark.SparkException: Failed to connect to driver!)
15/11/07 23:31:30 INFO yarn.ApplicationMaster: Deleting staging directory .sparkStaging/application_1446960366742_0002
从日志中可以看出,私有IP完全被忽略了,如何让YARN和spark使用hosts文件中指定的私有IP地址?
集群是使用 Ambari(HDP 2.4) 配置的
问题+1。
Spark 使用 Akka 进行通信。
所以它更像是一个 Akka 问题而不是 Spark。
If you need to bind your network interface to a different address - use akka.remote.netty.tcp.bind-hostname and akka.remote.netty.tcp.bind-port settings.
目前这是 spark 中的一个问题,让 spark 绑定到正确接口的唯一方法是使用自定义名称服务器。
Spark 本质上执行主机名查找并使用它找到的 IP 地址与 Akka 绑定。解决方法是创建自定义绑定区域和 运行 名称服务器。