MRv2 / YARN 特性

MRv2 / YARN Features

我正在努力思考新 API 的实际用途,并且通过互联网阅读,我发现了对我正在处理的相同问题的不同答案。

我想知道答案的问题是:

1) 哪个 MRv2/YARN 守护进程负责启动应用程序容器和监控应用程序资源使用情况。

2) MRv2/YARN 旨在解决哪两个问题?

我会尝试通过指定资源和我的搜索中的实际数据来使这个线程对其他读者具有教育意义和建设性,所以我希望它不会看起来像我提供了太多信息,而我可以问问题并缩短我的 post。

对于第一个问题,阅读文档,我可以找到 3 个主要资源可以依赖:

来自 Hadoop 文档:

ApplicationMaster<-->NodeManager Launch containers. Communicate with NodeManagers by using NMClientAsync objects, handling container events by NMClientAsync.CallbackHandler

The ApplicationMaster communicates with YARN cluster, and handles application execution. It performs operations in an asynchronous fashion. During application launch time, the main tasks of the ApplicationMaster are:

a) communicating with the ResourceManager to negotiate and allocate resources for future containers, and

b) after container allocation, communicating YARN NodeManagers (NMs) to launch application containers on them.

来自 Hortonworks 文档

The ApplicationMaster is, in effect, an instance of a framework-specific library and is responsible for negotiating resources from the ResourceManager and working with the NodeManager(s) to execute and monitor the containers and their resource consumption. It has the responsibility of negotiating appropriate resource containers from the ResourceManager, tracking their status and monitoring progress.

来自 Cloudera 文档:

MRv2 daemons -

ResourceManager – one per cluster – Starts ApplicationMasters, allocates resources on slave nodes

ApplicationMaster – one per job – Requests resources, manages individual Map and Reduce tasks

NodeManager – one per slave node – Manages resources on individual slave nodes

JobHistory – one per cluster – Archives jobs’ metrics and metadata

回到问题(哪些守护进程负责启动应用容器和监控应用资源使用)我问自己:

NodeManager 吗?是 ApplicationMaster?

据我了解,ApplicationMaster 是让 NodeManager 真正完成工作的人,所以这就像问谁负责从地上举起一个箱子,是那些人实际举起的手控制 body 并让他们进行举重的人...

我想这是一个棘手的问题,但必须只有一个答案。

第二个问题,在线阅读,我可以从许多资源中找到不同的答案,因此感到困惑,但我的主要来源是:

来自 Cloudera 文档:

MapReduce v2 (“MRv2”) – Built on top of YARN (Yet"Another Resource NegoGator)

– Uses ResourceManager/NodeManager architecture

– Increases scalability of cluster

– Node resources can be used for any type of task

– Improves cluster utilization

– Support for non/MR jobs

回到问题(MRv2/YARN 旨在解决哪两个问题?),我知道 MRv2 做了一些更改,例如防止 JobTracker 上的资源压力(在 MRv1 中,集群中的最大节点数可能在 4000 左右,在 MRv2 中是这个数字的 2 倍以上),我也知道它提供了 运行 除了 MapReduce 之外的框架的能力,例如 MPI。

来自文档:

The Application Master provides much of the functionality of the traditional ResourceManager so that the entire system can scale more dramatically. In tests, we’ve already successfully simulated 10,000 node clusters composed of modern hardware without significant issue.

和:

Moving all application framework specific code into the ApplicationMaster generalizes the system so that we can now support multiple frameworks such as MapReduce, MPI and Graph Processing.

但我也认为它处理了 NameNode 是单点故障的事实,并且在新版本中有通过高可用性模式的备用 NameNode(我可能会混淆旧版本和新版本的功能API,具有 MRv1 与 MRv2 的功能,这可能是我提出问题的原因):

Prior to Hadoop 2.0.0, the NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode, and if that machine or process became unavailable, the cluster as a whole would be unavailable until the NameNode was either restarted or brought up on a separate machine.

因此,如果您必须从 3 个中选择 2 个,那么 MRv2/YARN 旨在解决的两个问题是哪 2 个?

-JobTracker 资源压力

-能够 运行 MapReduce 以外的框架,例如 MPI。

-NameNode 中的单点故障。

提前致谢! D

Which of the MRv2/YARN daemons is the one responsible for launching application containers and monitoring application resource usage.

ResourceManager(RM) 负责启动一次特定作业的ApplicationMaster(AM),AM 已启动其AM 的职责是协商、分配和监控作业资源(容器)。

我建议您阅读 Hadoop Definitive Guide 第 6 章中的 MapReduce 作业剖析,以深入了解如何在 MR1 和 MR2 中分配作业资源。

Which two issues MRv2/YARN is designed to address?

YARN 尝试将 MR1 中 JobTracker 的功能(这是扩展的瓶颈)分离到自己的抽象中:

  • 集群资源管理-资源管理器
  • 应用程序生命周期管理 - 特定 application/job
  • 的应用程序大师

So if you would have to choose 2 of the 3, which ones would be the 2 that serve as the two issues MRv2/YARN is designed to address?

-Resource pressure on the JobTracker

-Ability to run frameworks other than MapReduce, such as MPI.

-Single point of failure in the NameNode.

从你的 2 个答案中,我会选择 1 和 2。

根据cloudera http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_mapreduce_to_yarn_migrate.html#concept_z1p_gmy_xl_unique_2

TaskTracker 已被 NodeManager 取代,NodeManager 是一种在主机上管理资源和部署的 YARN 服务。 它负责启动容器,每个容器可以容纳一个map或reduce任务。

所以是 NodeManager 为 mapred 任务启动容器。

虽然 ApplicationMaster 容器是由 ResourceManager 启动的。

只是为了澄清以上 "The ApplicationMaster container is launched by ResourceManager though" 表示 -- ResourceManager 指示 NodeManager 启动 Application Master Container。 ApplicationMaster Container的实际启动也是由NodeManager完成的。