General Flashcards
In the UI, where can you go to facilitate granting select access to a user
Data Explorer
What is Delta Lake
an open source storage format, like parquet, with additional capabilities that can provide reliability, security, and performance
How is the data organized in storage when managing a delta table
All of the data is broken down into one or more parquet files, log files are broken down to one or more json files, and each transaction creates a new data file and log file
What is the underlying technology that makes auto loader work
Structured Streaming
When should you use AutoLoader vs Copy Into
- You want to load from a location that contains files in the order of millions or higher. Auto Loader can discover files more efficiently
- Auto Loader supports file notification
- Data schema evolves frequently
When loading with Auto loader, how do you deal with an evolving schema
mergeSchema will infer the schema across multiple files and merge the schema of each file
How can you use merge to deduplicate upon writing
MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED THEN INSERT *
How do you use merge to delete all target rows that have no matches in the source table?
MERGE INTO target USING source
ON target.key = source.key
WHEN NOT MATCHED BY SOURCE THEN DELETE
Where can you include timeouts in jobs
in the task
How can you automate an alert
By having it on a refresh schedule
How do you grant read capability to a table
GRANT SELECT, USAGE ON TABLE customers to some@Email.com
What type of constraint keeps the bad records and adds them to the target dataset
CONSTRAINT valid_timestamp EXPECT (timestamp > ‘2020-01-01’)
What type of constraint drops bad records
CONSTRAINT valid_timestamp EXPECT (timestamp > ‘2020-01-01’) ON VIOLATION DROP ROW
What type of constraint fails when there is a bad record
CONSTRAINT valid_timestamp EXPECT (timestamp > ‘2020-01-01’) ON VIOLATION FAIL UPDATE
When creating a db what is the default location of the database
dbfs:/user/hive/warehouse
How do you create an external table
Answer is CREATE TABLE transactions (id int, desc string) USING DELTA LOCATION ‘/mnt/delta/transactions’