YARN is one of the main components of Apache Hadoop. It is used to manage system resources and schedule tasks. Learn more about its definition, purpose, features and components.
Within the open source distributed processing framework Apache Hadoop, YARN is one of the main components. This resource management and job scheduling technology has the role of allocate system resources to the different applications running in a Hadoop cluster. It is also used to schedule the execution of tasks on different clusters.
YARN: What is it?
The initials YARN stand for “Yet Another Resource Negotiator”, a name humorously coined by developers. This technology became a sub-project of Apache Hadoop in 2012, and has been added as a key feature of Hadoop with Update 2.0 deployed in 2013.
Before the addition of YARN, Hadoop could only run MapReduce applications. So YARN greatly increased the potential use cases of the framework. By decoupling resource management and scheduling from MapReduce’s data processing component, YARN also allowed Hadoop to support more applications and different types of treatment.
For example, Hadoop clusters are now able to launch real-time analysis applicationsThis allows for streaming data and interactive queries on Apache Spark while MapReduce is running.
YARN: What are the features?
YARN combines a central resource manager with containers, application coordinators and agents to monitor the processing operations of the various cluster nodes. YARN is able to allocate resources to applications dynamically according to their needs.
This component of Hadoop also offers several planning methods The following are available: FIFO Scheduler, Fair Scheduler, or Capacity Scheduler. In addition, the Reservation System feature allows users to reserve cluster resources in advance to ensure that important processing tasks are completed smoothly.
Another notable feature added with Hadoop 3.0, is YARN Federation. This increases the number of nodes that a single YARN implementation can support by connecting different subclusters with their own resource managers.
What are the main components of YARN?
YARN consists of several main components. The global resource manager (ResourceManager) has the role of accepting tasks submitted by users, scheduling tasks and allocating resources to them.
On each node, there is a NodeManager whose role is to monitor and report to the ResourceManager. There is also an ApplicationMaster, created for each application, responsible for negotiating resources and working with the NodeManager to execute and monitor tasks.
Finally, the resource containers are controlled by the NodeManagers and assign the resources allocated to the individual applications. Typically, YARN containers are organized into nodes and scheduled to perform tasks only if resources are available to do so. In Hadoop 3.0, however, it is possible to create “opportunistic containers” that can be placed on hold until resources are released. This concept helps to optimize the use of resources.