minTemperatures Flashcards by Mark Schmale

basic Spark import

from pyspark import SparkConf, SparkContext

How well did you know this?

Not at all

Perfectly

create Spark conf (for my machine)

conf = SparkConf().setMaster(“local”).setAppName(“MinTemperatures”)

How well did you know this?

Not at all

Perfectly

create Spark context

sc = SparkContext(conf = conf)

How well did you know this?

Not at all

Perfectly

split delimited string into fields

fields = line.split(‘,’)

How well did you know this?

Not at all

Perfectly

function syntax

def myFunctionName(myInput):
    ... 
    return (...myOutputTuple...)

How well did you know this?

Not at all

Perfectly

convert string to float

temperatureC = float(fields[3])

How well did you know this?

Not at all

Perfectly

read text file into RDD

lines = sc.textFile(“file:///path/filename.csv”)

How well did you know this?

Not at all

Perfectly

map an RDD

parsedLines = lines.map(parseLine) where parseLine is my map function

How well did you know this?

Not at all

Perfectly

filter an RDD

minTemps = parsedLines.filter(lambda x: “TMIN” in x[1])

How well did you know this?

Not at all

Perfectly

map subset of fields

stationTemps = minTemps.map(lambda x: (x[0], x[2]))

How well did you know this?

Not at all

Perfectly

reduce by key with min lambda

minTemps = stationTemps.reduceByKey(lambda x, y: min(x,y))

How well did you know this?

Not at all

Perfectly

move RDD to collection

results = minTemps.collect()

How well did you know this?

Not at all

Perfectly

loop through a collection “results”

for result in results:

…some action…

How well did you know this?

Not at all

Perfectly

print value result[0]

print result[0]

How well did you know this?

Not at all

Perfectly

print tab escape sequence

How well did you know this?

Not at all

Perfectly

format a float “result” as a string w/ 2 decimals

Study These Flashcards

”“.format(result)

add an int and string value to a string in format

Study These Flashcards

”{} {}”.format(myInt,myStr)

split a space delimited line and take only third field (as a string)

Study These Flashcards

line.split()[2]

sort a dictionary

Study These Flashcards

sortedResult = sorted(result.items())

loop through a dictionary

Study These Flashcards

for key, value in sortedResult:

count values in RDD

Study These Flashcards

result = ratings.countByValue()

clean word to show as ascii

Study These Flashcards

cleanWord = word.encode(“ascii”,”ignore”)

how do you import regular expressions

Study These Flashcards

import re

what library use for natural language?

Study These Flashcards

nltk - natural language tool kit

what are regular expressions?

language for text processing

split "text" by word, allow for unicode

re.compile(r'\W+', re.UNICODE).split(text.lower())

broadcast a dictionary in Spark

nameDict = sc.broadcast(loadMovieNames())

retrieve value from dictionary

nameDict.value[keyVal]

minTemperatures Flashcards

(28 cards)