2025 Continuous Data Pipelines Flashcards
In general, what are the three ways to load data
Batches
Micro Batches
Near Real Time
What options are used to enable Snowflake to do a continuous data pipeline
Continuous data loading
Change data tracking
Recurring Tasks
Streams can be created to query change data on what objects
Tables
Views, including Secure Views
Directory Tables
External Tables
What additional columns appear when quering a stream
METADATA$ACTION (Insert or delete)
METADATA$ISUPDATE (true or false)
METADATA$ROW_ID
What advances the offset value in a stream
When any stream is used in a DML transaction
When a record is updated, how is it reflected in the stream
Two records for the change.
One where ACTION is delete and UPDATE is true
The other ACTION is insert and update is true
Where does change tracking need to be enabled to put a stream on a view
The view and the underlying tables
What are the three types of streams
Standard
Append Only (update and delete not recorded)
Insert Only (external tables only, records inserts)
What does it mean if the stream is stale
The offset of a stream is outside the data retention period. You cannot access historical data for the source table and you will need to create a new stream
After how many days, if a stream is not consumed, the table retention period up to the stream offset
14
If both parameters DATA_RETENETION_TIME_IN_DAYS AND MAX_DATA_EXTENSION_TIME_IN DAYS is defined, which is used
The one where the retention period of the stream would be the highest
Streams on what Snowflake objects do not have a retention period
directory tables
external tables
How long do you have to use RESULT_SCAN
24 hours
What is the default state when a task is created?
Suspend
What are the two types of tasks
Serverless
Tasks managed by users
How do you make a task serverless
You leave out the warehouse
What function indicates if a stream has data or not
SYSTEM$STREAM_HAS_DATA