SPARK Basics Flashcards

1
Q

Spark Context

A
  • Every Spark application requires a Spark Context
  • Spark Shell provides a preconfigured Spark Context called sc
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

RDDs
(Resilient Distributed Datasets)

A
  • RDD (Resilient Distributed Dataset)
    • Resilient - if data in memory is lost, it can be recreated
    • Distributed - processed across the cluster
    • Dataset - initial data can come from a file or be created programmatically
  • RDDs are fundamental unit of data in Spark
  • Most Spark programming consists of performing operations on RDDs
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Creating an RDD

A
  • Three ways to create an RDD
    • From a file or set of files
    • From data in memory
    • From another RDD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

A file based RDD

A

> val mydata = sc.textFile(“purplecow.txt”)
> mydata.count()

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

RDD Operations

A
  • Transformation: define a new RDD based on the current one(s)
  • Actions: return value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

RDD Operations: ACTIONS

A
  • Some common actions:
    • count() - returns the number of elements
    • take(n) - return an array of the first n elements
    • collect() - return an array of all elements
    • saveAsTextFile(file) - save to text file(s)

Example:

for (line <- mydata.take(2))
println(line)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RDD Operations: TRANSFORMATION

A
  • Transformations create a new RDD from an existing one
  • RDDs are immutable
    • Data in an RDD is never changed
    • Transform in sequence to modify the data as needed
  • Some common transformations
    • map(function) - creates a new RDD by performing a function on each record in the base RDD
    • filter(function) - creates a new RDD by including or excluding each record in the base RDD according to a boolean function
How well did you know this?
1
Not at all
2
3
4
5
Perfectly