Introduction Flashcards

1
Q

What language is Spark written in?

A

SCALA which is based on JAVA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cluster Manager

A

Acquire’s resources, worker nodes, executors and task required to perform the work.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Partition

A

Large task are split into chunks to be sent to a different node for processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Executors

A

Executors are contained within each node and perform task to work in parallel with eachother. Each executor uses a seperate Java Virtual Machine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

worker node

A

Any node that can run application code in the cluster.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Task

A

A unit of work that will be sent to one executor

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Job

A

A Parallel computation consisting of multiple task that gets spawned in response to a Spark action.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Stage

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Cluster Managers

A

Program that controls the how the cluster processes data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Spark Standalone

A

A basic built-in cluster manager.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Apache Mesos

A

A general cluster manager that can also run Hadoop MapReduce and service applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Hadoop Yarn

A

The resource manager used in Hadoop 2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Kubernetes

A

An open-source service for automating deployment, scaling, and management of containerized applications.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 4 spark core services?

A

Spark SQL, Spark Streaming, MLIB Machine Learning, GraphX

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What type of Dataframe does Spark use?

A

Spark SQL uses a distributed DataFrame

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is spark streaming?

A

Spark Streaming offers realtime data processing that can take input’s from multiple sources, integrated with machine learning, then output to different data storage systems.

17
Q

What is Spark MLIB

A

Spark Machine Learning provides a set of tools for ML that are optimized to use with paralleized execution which enables processing of big data.

18
Q

What is GraphX?

A

GraphX is a library of tools used to traverse netwroks, display paths and visualize connections. (think relationship data between entities like flights, social networks )

19
Q

What is DataBricks?

A

A commercial, for-profit, company, founded by two of the creators of Apache Spark. The application provides a complete data engineering and science collaborative environment to develop spark applications