Book Concepts Flashcards
What are the 4 key characteristics for Spark?
- Speed
- Ease of Use
- Modularity
- Extensibility
Why is Spark faster than its predecessors?
Because it does it intermediary calculation in-memory.
In-memory calculation is related with which Spark characteristic?
Speed
DAGs is related with which Spark characteristic?
Speed
Read data from different sources is related with which Spark characteristic?
Extensibility
The spark core modules is related with which Spark characteristic?
Modularity
RDD is related with which Spark characteristic?
Ease of use
Which is RDD?
Resilient Distributed Dataset. Partitioned and immutable collection of data.
Which languages Spark Support?
Scala, Java, Python, SQL, and R
Which modules Spark has?
- Spark SQL
- Spark Streaming
- MLib
- GraphX
(T or F) Spark can’t deal with late-data.
False
(T or F) Is Spark is fault-tolerant?
True
Which are the 2 main components of Spark?
- Driver
- Executor
Explain Spark Driver
Component responsible to orchestrate parallel operations on Spark cluster
Explain Spark Executor
Component responsible to execute the tasks
Which component is responsible to request more CPU and memory from the cluster?
Driver
Which component creates the DAG?
Driver
Which component communicates with the cluster?
Driver
Explain SparkSession
Is the conduit to all spark operations and the data.
Explain Cluster Manager
Is the responsible to managing and allocating resources for the cluster nodes
Explain Spark Architecture
A driver creates a SparkSession to do all the interface between Spark and the data. The SparkContext comunicates with the Cluster which is responsible to allocate the resources and manage the tasks’ execution. The tasks are sent to Workers which do the job.
Which component creates SparkContext?
Driver
Which are the four cluster managers supported by Spark?
- Haddop YARN
- Mesos
- Kubernetes
- Standalone
Explain Job
Multiple tasks in parallel. Each job is transformed into a DAG. The DAG is the execution plan of a spark.
Explain Stage
Smaller parts of a Job
Explain task
Single unit of work that will be sent to spark executor