03 Dataframe Essentials Flashcards

1
Q

Define Spark Session

A

Entry point to spark programming

Databricks directly starts spark session

Offers gateway to spark dataframe and api’s

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Modes allowed while uploading the file

A

Failfast- as soon as the data mismatch the query fails

Dropmalformed - will leave malformed data and upload others

Permissive- malformed data is set to null

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Spark Context

A

Manages connection to spark cluster

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Modes that data frame writer support

A
  1. Append - appends the data to existing data
  2. Overwrite- overwrite existing data
  3. Error or errorifexists- will throw an error if table already exists
  4. Ignore -will write if df is not there and if there it will ignore.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are managed tables

A
  1. It is created at a predefined warehouse location.
  2. The location is defined at the time of cluster creation.
  3. This location is staticconfiguration - it can’t be changed after setting up.
  4. Spark creates table data and metadata both at the same time.
  5. Spark manages table metadata and table data. Also when data is dropped table metadata is dropped as well.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is external table

A
  1. Helps for shared data across storing layers.
  2. To use table across different storing layer we can either copy table from one storage to different storage. Or we can store in external location which can be used across all storing layers.
  3. When we create an external table the metadata will be created at the file location only hence when the table is dropped the metadata will only be dropped but not the table files.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly