03 Dataframe Essentials Flashcards
1
Q
Define Spark Session
A
Entry point to spark programming
Databricks directly starts spark session
Offers gateway to spark dataframe and api’s
2
Q
Modes allowed while uploading the file
A
Failfast- as soon as the data mismatch the query fails
Dropmalformed - will leave malformed data and upload others
Permissive- malformed data is set to null
3
Q
Spark Context
A
Manages connection to spark cluster
4
Q
Modes that data frame writer support
A
- Append - appends the data to existing data
- Overwrite- overwrite existing data
- Error or errorifexists- will throw an error if table already exists
- Ignore -will write if df is not there and if there it will ignore.
5
Q
What are managed tables
A
- It is created at a predefined warehouse location.
- The location is defined at the time of cluster creation.
- This location is staticconfiguration - it can’t be changed after setting up.
- Spark creates table data and metadata both at the same time.
- Spark manages table metadata and table data. Also when data is dropped table metadata is dropped as well.
6
Q
What is external table
A
- Helps for shared data across storing layers.
- To use table across different storing layer we can either copy table from one storage to different storage. Or we can store in external location which can be used across all storing layers.
- When we create an external table the metadata will be created at the file location only hence when the table is dropped the metadata will only be dropped but not the table files.