Batch - Yarn and MapReduce Flashcards
YARN (Yet Another Resource Negotiator)
Resource management system designed to handle distributed computing
YARN APIs
Request and work with cluster resources (not made by user code, but by framework itself!)
Fundamental idea of YARN
Split functionalities of resource management and job scheduling.
What makes Yarn scheduler a “pure scheduler”?
Doesn’t monitor application/job status
Doesn’t restart application/job on failure
Application
Single job or DAG of jobs
Applications Manager job
Accept job submissions, negotiate container for executing AMP, provide service for restarting if AMP fails
FIFO scheduler
No configuration necessary, bad for clusters
Capacity Scheduler
Fixed amount of capacity to each job
Fair Scheduler
Balances available resources between running jobs
Resource Manager (RM) (def) (2)
Ultimate authority allocating containers,
1. Accept job submissions from
client
2. Set up ApplicationsMaster (w/ initial container)
Node Manager (NM)
A per-machine agent monitoring resource usage of containers and reports it to RM
ApplicationsMaster (2)
Manage job lifecycle and request containers from RM
Upon request from client, RM finds a NM to launch ______ in a ___________.
Application Master Process; container
Container
Slice of computing resources, reports job status to AMP
Data Locality (YARN)
Ensuring tasks are run as close to the data as possible