Apache Spark Flashcards

Learn about Apache Spark

1
Q

What is Apache Spark?

A

An open-source, distributed computing system for big data processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does Spark achieve fast data processing?

A

By performing computations in-memory instead of writing intermediate results to disk

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the scalability capability of Spark?

A

It can scale from a single machine to thousands of nodes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Spark Core responsible for?

A

Handles scheduling, memory management, fault tolerance, and task dispatching

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What functionality does Spark SQL provide?

A

Interacts with structured data through SQL queries and integrates with data sources like Hive and Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of Spark Streaming?

A

Enables real-time stream processing for continuously incoming data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is MLlib?

A

A library for machine learning algorithms and tools for big data processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What does GraphX do?

A

Analyzes relationships in data, such as social networks or recommendation systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which programming languages does Spark support?

A

Java, Scala, Python, and R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Spark’s built-in fault tolerance mechanism?

A

Keeps data replicas across nodes using Resilient Distributed Datasets (RDDs)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

True or False: Apache Spark processes data slower than traditional MapReduce frameworks.

A

False

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Fill in the blank: Apache Spark is designed to handle _______ data processing and analytics.

A

[large-scale]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What makes Spark suitable for big data processing?

A

Its speed, scalability, ease of use, and fault tolerance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly