Introduction Flashcards
What language is Spark written in?
SCALA which is based on JAVA
Cluster Manager
Acquire’s resources, worker nodes, executors and task required to perform the work.
Partition
Large task are split into chunks to be sent to a different node for processing.
Executors
Executors are contained within each node and perform task to work in parallel with eachother. Each executor uses a seperate Java Virtual Machine
worker node
Any node that can run application code in the cluster.
Task
A unit of work that will be sent to one executor
Job
A Parallel computation consisting of multiple task that gets spawned in response to a Spark action.
Stage
Cluster Managers
Program that controls the how the cluster processes data.
Spark Standalone
A basic built-in cluster manager.
Apache Mesos
A general cluster manager that can also run Hadoop MapReduce and service applications.
Hadoop Yarn
The resource manager used in Hadoop 2
Kubernetes
An open-source service for automating deployment, scaling, and management of containerized applications.
What are the 4 spark core services?
Spark SQL, Spark Streaming, MLIB Machine Learning, GraphX
What type of Dataframe does Spark use?
Spark SQL uses a distributed DataFrame