Snowflake Flashcards
What are the key features of Snowflake?
Pure SaaS
Relational
Semi-Structured
Elastic
Highly Available
Durable
Cost-efficient
What does Pure SaaS mean in Snowflake?
No maintaining machines, Database Administering or installing / upgrading software
What does Relational mean in Snowflake?
It supports SQL and ACID transactions allowing for switch from standard relational DB like PostgreSQL without big adjustments
What does Semi-structured mean in Snowflake?
- There are built in function to flatten, traverse and nest the semi-structured data
- The Schemas can be automatically discovered thanks to automatic discovery allowing to make the operations on the data almost as quickly as on relational data without any user effort.
What does Elastic mean in Snowflake?
Simple shared-nothing structure, where storage and compute are decoupled. Allowing for seamless and independent scaling.
What does highly available mean in Snowflake?
It tolerates node, cluster and even full data center failure.
What does Durable mean in Snowflake?
Extra safeguards like cloning, undrop and cross-region backups are in place.
What does Cost-efficient mean in Snowflake?
Pay only for what u use. Additionally compressed data and efficient compute allow for even bigger savings.
What are the drawbacks of Snowflake design?
Heterogeneous Workload
Membership Changes
Online Upgrade
Why is Heterogeneous Workload a drawback in snowflake?
Snowflake is configured best for high I/O band-width light compute. Which can be very slow for complex queries.
Why is Membership change a drawback in Snowflake?
When moving data or changing the number of nodes there is very expensive operation of reshuffling and as the same nodes are in charge of it as to process any incoming requests it may affect performance and elasticity.
Why is Online Upgrade a drawback in Snowflake?
Similar as with Membership Change, however it cannot be mitigated by replication as eventually every node will need to receive update and as in theory it is possible to upgrade one node after another in Praxis it’s damn hard as everything is coupled and homogeneous.
How does the compute and storage look like?
Compute is a proprietary shared-nothing engine.
Storage is typically provided by Amazon S3 Bucket but could be any blob storage.
What is a big thing about the compute engine in Snowflake?
Local data storage - every node has a hot SSD local cache (for temp data only) that after loading with the parts of data it is responsible for can exceed the performance of a pure shared-nothing architecture.