Yarn and Its Failures
Yarn and Its Failures
Chennammal.S-21AIA17
YARN
YARN, or Yet Another Resource Negotiator, is a cluster resource
management framework for large-scale data processing.
It was introduced in Hadoop 2.0
It is a core component of Apache Hadoop 2.0 and later.
YARN provides a unified resource management and scheduling layer
for all distributed applications, including batch processing, stream
processing, interactive processing, and graph processing.
In Hadoop 1.x Architecture Job Tracker was carring the responsibility
of Job scheduling and Monitoring as well as manangong resourse
across the cluster
Task Tracker was executing map reduce tasks on the slave nodes
This design resulted in scalability bottleneck due to a single Job
Tracker
Apart from this limitation ,computational resources is inefficient
To overcome all these issues ,YARN was introduced in the Hadoop
version2.0 by Yahoo and Hortonworks
YARN started to give Hadoop the ability to run non-Map reduce jobs
within Hadoop Fframework
Hadoop1.0 Architecture
YARN Architecture
YARN has a two-tier architecture:
• ResourceManager: The ResourceManager is the global resource manager
for the cluster. It is responsible for allocating resources to applications and
monitoring their progress.
• NodeManager: The NodeManager is a daemon that runs on each node in
the cluster. It is responsible for managing the resources on the node and
executing tasks for applications.
When an application is submitted to YARN, the ResourceManager creates an
ApplicationMaster container. The ApplicationMaster is responsible for
negotiating resources from the ResourceManager and scheduling tasks to
the NodeManagers. The NodeManagers execute the tasks and report their
progress to the ApplicationMaster. The ApplicationMaster monitors the
progress of the tasks and restarts any tasks that fail.
Components of YARN
Client
Resource Manager
1.Scheduler
2.Application master
Node Manager
1.Apllication Master
2. container
Hadoop Yarn Architecture
Hadoop Yarn Architecture
Client: It submits map-reduce jobs.
Resource Manager: It is the master daemon of YARN and is
responsible for resource assignment and management among all
the applications. Whenever it receives a processing request, it
forwards it to the corresponding node manager and allocates
resources for the completion of the request accordingly. It has two
major components:
1.Scheduler: It performs scheduling based on the allocated
application and available resources. It is a pure scheduler, means it
does not perform other tasks such as monitoring or tracking and does
not guarantee a restart if a task fails. The YARN scheduler supports
plugins such as Capacity Scheduler and Fair Scheduler to partition
the cluster resources.
2.Application manager: It is responsible for accepting the application and
negotiating the first container from the resource manager. It also restarts the
Application Master container if a task fails.
Node Manager: It take care of individual node on Hadoop cluster and manages
application and workflow and that particular node. Its primary job is to keep-up
with the Resource Manager. It registers with the Resource Manager and sends
heartbeats with the health status of the node. It monitors resource usage,
performs log management and also kills a container based on directions from
the resource manager. It is also responsible for creating the container process
and start it on the request of Application master.
• Application Master: An application is a single job submitted to a
framework. The application master is responsible for negotiating
resources with the resource manager, tracking the status and
monitoring progress of a single application. The application master
requests the container from the node manager by sending a
Container Launch Context(CLC) which includes everything an
application needs to run. Once the application is started, it sends
the health report to the resource manager from time-to-time.
• Container: It is a collection of physical resources such as RAM, CPU
cores and disk on a single node. The containers are invoked by
Container Launch Context(CLC) which is a record that contains
information such as environment variables, security tokens,
dependencies etc.
YARN EXECUTION OVERVIEW
Client: For submitting MapReduce jobs.
Resource Manager: To manage the use of resources across the cluster
Node Manager:For launching and monitoring the computer containers on
machines in the cluster.
Map Reduce Application Master: Checks tasks running the MapReduce
job. The application master and the MapReduce tasks run in containers
that are scheduled by the resource manager, and managed by the node
managers.
Jobtracker & Tasktrackerwere were used in previous version of Hadoop,
which were responsible for handling resources and checking progress
management. However, Hadoop 2.0 has Resource manager and
NodeManager to overcome the shortfall of Jobtracker & Tasktracker.
Advantages of Yarn
• Flexibility: YARN offers flexibility to run various types of distributed processing
systems such as Apache Spark, Apache Flink, Apache Storm, and others.
• Resource Management: It allows administrators to allocate and monitor the
resources required by each application in a cluster, such as CPU, memory, and
disk space.
• Scalability: YARN is designed to be highly scalable and can handle thousands of
nodes in a cluster
• Improved Performance: YARN offers better performance by providing a
centralized resource management system.
• Security: YARN provides robust security features such as Kerberos
authentication, Secure Shell (SSH) access, and secure data transmission. It
ensures that the data stored and processed on the Hadoop cluster is secure.
Disadvantages of Yarn
• Complexity: It requires additional configurations and settings, which can
be difficult for users who are not familiar with YARN.
• Overhead: YARN introduces additional overhead, which can slow down
the performance of the Hadoop cluster.
• Latency: YARN introduces additional latency in the Hadoop ecosystem.
This latency can be caused by resource allocation, application scheduling,
and communication between components.
• Single Point of Failure: If YARN fails, it can cause the entire cluster to go
down. To avoid this, administrators need to set up a backup YARN
instance for high availability.
• Limited Support: YARN has limited support for non-Java programming
languages. Although it supports multiple processing engines, some
engines have limited language support
YARN FAILURES
• Identifying Common Failures
• There are several common failures that can occur when working with
Yarn, including issues with resource allocation, configuration errors, and
job scheduling problems. It is important to identify these failures early on
in order to minimize their impact on your workflow.
• Debugging Techniques
• Start by reviewing the Yarn logs to identify any error messages or warnings
that may be related to the failure. These logs can provide valuable
information about the underlying issue and help guide your debugging
efforts.Check the configuration settings for Yarn and the Hadoop cluster
to ensure that they are properly set up and configured. Often, failures can
be traced back to misconfigured settings or incorrect parameter
values.Use debugging tools such as breakpoints and stack traces to
identify the root cause of the failure. These tools can help you pinpoint
the exact location in the code where the error occurred and provide
insights into how to fix it.
• Case Study 1: Spark Application Failure
• A company was running a Spark application on Yarn, but it kept failing
with a cryptic error message. After examining the Yarn logs, they
discovered that the application was requesting more memory than was
available on the cluster. By adjusting the memory settings and re-running
the application, they were able to successfully complete the job
• Case Study 2: Node Manager Failure
• Another company was experiencing intermittent failures with their Yarn
cluster. After investigating, they found that the Node Manager on one of
the nodes was crashing due to a memory leak. By increasing the memory
allocation for the Node Manager and monitoring it more closely, they
were able to prevent further failures.
Best Practices