Introduction to Apache Spark and Scala Programming Flashcards
What is Spark?
Spark is an execution engine that can do fast computations on big datasets.
True or false: Spark offers a big data storage solution.
False.
What does Spark focus on?
Fast computation.
What is Spark’s computational method?
Spark replaces Hadoop’s implementation of MapReduce with its own implementation.
What storage does Hadoop have and what does Spark have?
Hadoop has Hadoop Distributed File System while Spark has none.
What MapReduce does Hadoop have and what does Spark have?
Hadoop has built-in and Spark has an optimized built-in.
What speed does Hadoop have and what does Spark have?
Hadoop is considered fast but Spark is 10-100 times faster.
What resource management does Hadoop have and what does Spark have?
Hadoop has YARN and Spark has its standalone.
How is fault tolerance achieved?
Resilient Distributed Datasets (RDDs).
What is Scala?
Scalable Language (Scala) is an object-orientated and functional programming language.
Why does Spark use Scala?
Scala is the preferred writing because it works with the JVM so interaction with Hadoop is easier.
What message broker model does Kafka use?