Apache Spark Flashcards

Question 1

Q

What is Apache Spark?

Answer

A

An open-source, distributed computing system for big data processing

Question 2

Q

How does Spark achieve fast data processing?

Answer

A

By performing computations in-memory instead of writing intermediate results to disk

Question 3

Q

What is the scalability capability of Spark?

Answer

A

It can scale from a single machine to thousands of nodes

Question 4

Q

What is Spark Core responsible for?

Answer

A

Handles scheduling, memory management, fault tolerance, and task dispatching

Question 5

Q

What functionality does Spark SQL provide?

Answer

A

Interacts with structured data through SQL queries and integrates with data sources like Hive and Parquet

Question 6

Q

What is the purpose of Spark Streaming?

Answer

A

Enables real-time stream processing for continuously incoming data

Question 7

Q

What is MLlib?

Answer

A

A library for machine learning algorithms and tools for big data processing

Question 8

Q

What does GraphX do?

Answer

A

Analyzes relationships in data, such as social networks or recommendation systems

Question 9

Q

Which programming languages does Spark support?

Answer

A

Java, Scala, Python, and R

Question 10

Q

What is Spark’s built-in fault tolerance mechanism?

Answer

A

Keeps data replicas across nodes using Resilient Distributed Datasets (RDDs)

Question 11

Q

True or False: Apache Spark processes data slower than traditional MapReduce frameworks.

Question 12

Q

Fill in the blank: Apache Spark is designed to handle _______ data processing and analytics.

Answer

A

[large-scale]

Question 13

Q

What makes Spark suitable for big data processing?

Answer

A

Its speed, scalability, ease of use, and fault tolerance

Apache Spark Flashcards

Learn about Apache Spark