SQL Flashcards by Mark Schmale

SQL is pronounced

Spark Sequel

How well did you know this?

Not at all

Perfectly

Spark SQL extends RDDs to a

“DataFrame” object

How well did you know this?

Not at all

Perfectly

DataFrames contain _____ objects

row

How well did you know this?

Not at all

Perfectly

DataFrames have a ____, which leads to more efficient storage

schema

How well did you know this?

Not at all

Perfectly

DataFrames can run _____ queries

SQL

How well did you know this?

Not at all

Perfectly

parquet is a

popular column data store object

How well did you know this?

Not at all

Perfectly

Spark SQL can read and write

Hive, JSON, parquet

How well did you know this?

Not at all

Perfectly

import

from pyspark.sql import SQLContext, Row

How well did you know this?

Not at all

Perfectly

To use SQL first thing you do is create a

Hive context

How well did you know this?

Not at all

Perfectly

create a Hive context

hiveContext = HiveContext(sc)

How well did you know this?

Not at all

Perfectly

get Hive data from JSON

inputData = hiveContext.jsonFile(dataFile)

How well did you know this?

Not at all

Perfectly

JSON is pronounced

Jay Sahn

How well did you know this?

Not at all

Perfectly

infer schema from inputData

inputData.registerTempTable(“myStructuredStuff”)

How well did you know this?

Not at all

Perfectly

run a query and make a DataFrame

myResultDataFrame = hiveContext.sql(‘”“‘SELECT foo FROM bar ORDER BY footer’””’)

How well did you know this?

Not at all

Perfectly

alternative to HiveContext

SQLContext

How well did you know this?

Not at all

Perfectly

Difference between HiveContext and SQLContext

Study These Flashcards

Hive on top of SQL. Hive compatibility. Hive has heavier dependencies, but also a bit ahead of SQL at least in 1.5.

SQL Flashcards

(16 cards)