Spark 作业与 yarn-client 一起正常工作,但与 yarn-cluster 完全不工作
Spark job is working properly with yarn-client, but not working at all with yarn-cluster
我在用 yarn 提交 spark 作业 jar 时遇到问题。当我用 --master yarn-client
提交时,它运行良好并给我预期的结果
命令如下;
./spark-submit --class main.MainClass --master yarn-client --driver-memory 4g --executor-memory 4g --num-executors 4 --executor-cores 2 job.jar 其他选项
但是提交到集群模式时同样不起作用;命令如下;
./spark-submit --class main.MainClass --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 4g - -num-executors 4 --executor-cores 2 job.jar 其他选项
我的yarn-site.xml如下;
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>20048</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
我的 yarn stderr 日志是
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3315fed4{/static,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e430b9a{/,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77184f65{/api,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@643f7b84{/stages/stage/kill,null,AVAILABLE}
17/03/23 03:30:44 INFO server.ServerConnector: Started ServerConnector@27614db2{HTTP/1.1}{0.0.0.0:37212}
17/03/23 03:30:44 INFO server.Server: Started @7799ms
17/03/23 03:30:44 INFO util.Utils: Successfully started service 'SparkUI' on port 37212.
17/03/23 03:30:44 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://50.31.66.56:37212
17/03/23 03:30:44 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
17/03/23 03:30:44 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1490254182417_0001 and attemptId Some(appattempt_1490254182417_0001_000001)
17/03/23 03:30:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45469.
17/03/23 03:30:44 INFO netty.NettyBlockTransferService: Server created on 50.31.66.56:45469
17/03/23 03:30:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 50.31.66.56:45469 with 2004.6 MB RAM, BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@60245f4e{/metrics/json,null,AVAILABLE}
17/03/23 03:30:49 INFO scheduler.EventLoggingListener: Logging events to hdfs://mecku-1:54310/spark/application_1490254182417_0001_1
17/03/23 03:30:49 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@50.31.66.56:50465)
17/03/23 03:30:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
17/03/23 03:30:49 INFO yarn.YarnRMClient: Registering the ApplicationMaster
17/03/23 03:30:49 INFO yarn.YarnAllocator: Will request 4 executor containers, each with 2 cores and 4505 MB memory including 409 MB overhead
17/03/23 03:30:49 INFO yarn.YarnAllocator: Canceled 0 container requests (locality no longer needed)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/03/23 03:30:49 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://localhost:54310/user/root/.sparkStaging/application_1490254182417_0001
17/03/23 03:30:49 INFO storage.DiskBlockManager: Shutdown hook called
17/03/23 03:30:49 INFO util.ShutdownHookManager: Shutdown hook called
17/03/23 03:30:49 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1490254182417_0001/spark-d77de654-4040-4b43-8155-efb155008b4b
17/03/23 03:30:49 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1490254182417_0001/spark-d77de654-4040-4b43-8155-efb155008b4b/userFiles-d71596df-df26-4b88-b51e-f0b962daf84a
17/03/23 03:30:40 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1490254182417_0001_000001
17/03/23 03:30:40 信息 spark.SecurityManager:将视图 acls 更改为:root
23 年 3 月 17 日 03:30:40 信息 spark.SecurityManager:将修改 acls 更改为:ro
但是我的 spark 作业不是 运行,正如您所看到的,这里没有显示任何错误。这个问题背后有什么想法吗?
也许,您的从节点不工作。
你应该在命令下检查你的节点,
sudo -u yarn yarn node -list
如果找不到所有节点,您应该修复节点设置。
比如selinux off(勾选getenforce
),每个节点的yarn-site.xml和core-site.xml.
我在用 yarn 提交 spark 作业 jar 时遇到问题。当我用 --master yarn-client
提交时,它运行良好并给我预期的结果命令如下;
./spark-submit --class main.MainClass --master yarn-client --driver-memory 4g --executor-memory 4g --num-executors 4 --executor-cores 2 job.jar 其他选项
但是提交到集群模式时同样不起作用;命令如下;
./spark-submit --class main.MainClass --master yarn --deploy-mode cluster --driver-memory 4g --executor-memory 4g - -num-executors 4 --executor-cores 2 job.jar 其他选项
我的yarn-site.xml如下;
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>20048</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>24096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
我的 yarn stderr 日志是
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3315fed4{/static,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3e430b9a{/,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@77184f65{/api,null,AVAILABLE}
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@643f7b84{/stages/stage/kill,null,AVAILABLE}
17/03/23 03:30:44 INFO server.ServerConnector: Started ServerConnector@27614db2{HTTP/1.1}{0.0.0.0:37212}
17/03/23 03:30:44 INFO server.Server: Started @7799ms
17/03/23 03:30:44 INFO util.Utils: Successfully started service 'SparkUI' on port 37212.
17/03/23 03:30:44 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://50.31.66.56:37212
17/03/23 03:30:44 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
17/03/23 03:30:44 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1490254182417_0001 and attemptId Some(appattempt_1490254182417_0001_000001)
17/03/23 03:30:44 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 45469.
17/03/23 03:30:44 INFO netty.NettyBlockTransferService: Server created on 50.31.66.56:45469
17/03/23 03:30:44 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO storage.BlockManagerMasterEndpoint: Registering block manager 50.31.66.56:45469 with 2004.6 MB RAM, BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 50.31.66.56, 45469)
17/03/23 03:30:44 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@60245f4e{/metrics/json,null,AVAILABLE}
17/03/23 03:30:49 INFO scheduler.EventLoggingListener: Logging events to hdfs://mecku-1:54310/spark/application_1490254182417_0001_1
17/03/23 03:30:49 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@50.31.66.56:50465)
17/03/23 03:30:49 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
17/03/23 03:30:49 INFO yarn.YarnRMClient: Registering the ApplicationMaster
17/03/23 03:30:49 INFO yarn.YarnAllocator: Will request 4 executor containers, each with 2 cores and 4505 MB memory including 409 MB overhead
17/03/23 03:30:49 INFO yarn.YarnAllocator: Canceled 0 container requests (locality no longer needed)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.YarnAllocator: Submitted container request (host: Any, capability: <memory:4505, vCores:2>)
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
17/03/23 03:30:49 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
17/03/23 03:30:49 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://localhost:54310/user/root/.sparkStaging/application_1490254182417_0001
17/03/23 03:30:49 INFO storage.DiskBlockManager: Shutdown hook called
17/03/23 03:30:49 INFO util.ShutdownHookManager: Shutdown hook called
17/03/23 03:30:49 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1490254182417_0001/spark-d77de654-4040-4b43-8155-efb155008b4b
17/03/23 03:30:49 INFO util.ShutdownHookManager: Deleting directory /tmp/hadoop-root/nm-local-dir/usercache/root/appcache/application_1490254182417_0001/spark-d77de654-4040-4b43-8155-efb155008b4b/userFiles-d71596df-df26-4b88-b51e-f0b962daf84a
17/03/23 03:30:40 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1490254182417_0001_000001
17/03/23 03:30:40 信息 spark.SecurityManager:将视图 acls 更改为:root 23 年 3 月 17 日 03:30:40 信息 spark.SecurityManager:将修改 acls 更改为:ro
但是我的 spark 作业不是 运行,正如您所看到的,这里没有显示任何错误。这个问题背后有什么想法吗?
也许,您的从节点不工作。 你应该在命令下检查你的节点,
sudo -u yarn yarn node -list
如果找不到所有节点,您应该修复节点设置。
比如selinux off(勾选getenforce
),每个节点的yarn-site.xml和core-site.xml.