DP-200 Study Notes Flashcards
What is a data schema?
It’s a skeleton of your data and how it relates data to itself.
What is the difference between a physical schema and a logical schema?
A physical schema is on disk.
The logical schema is tables, views.
What’s the difference between a snowflake schema and a star schema
Star Schema has one fact table and 1 level of dimension tables. It’s not normalised and very easy to query.
Snowflake has multiple levels of dimnension tables. It’s normalised and much slower to query (because you need to make joins)
What’s the general difference between a Azure Data Lake, SQL Database and a SQL Data Warehouse
A Data Warehouse is used for complex queries with some strong business incentive.
Database has very recent and topical data such as how many current orders are there in the business.
A data lake is where you store all your data before its refinement (unstructured)
Briefly describe NoSQL
- No Schema
- Best when there is a large amount of data when there relationship changes and the data changes over time
- Horizontal Scaling (shards out across machines)
- Typically JSON, key-value, graph and column-wide data
Briefly describe a Data Lake
Data in its original form, sitting and waiting until it is needed for processing.
Ingest from source into Data Lake using Data Factory. Databricks can be used to process it for reporting.
What are the 5 consistency levels of CosmosDB
Strong, Boundless Staleness, Session, Consistent Prefix, Eventual.
What does it mean that CosmosDB is multimodel?
It supports SQL API, Cassandra API, CosmosDB API, Gremlin API, Tables API.
What is the SLA and latency of CosmosDB?
SLA is 99.999%
Latency is <10ms
How many copies of your data in cosmosDB?
Copies of your data is = n*4 (where n is the number of regions you are in)
What is the idea behind Logical Partitions
Use a key that spreads across partitions well
10gb per partitions
Throughput costs depend on the container #, keys and workload distribution
What is Azure Table Storage?
It’s NoSQL, key-value store, commonly used for large amounts of structured data. Storing datasets that don’t require complex joins or queries.
Used when you don’t need global replication (when you might use CosmosDB)
What is Azure Data Warehouse?
A repo of non-redundant data
ETL takes data from the lake into the warehouse
Cloud is a good place for this for the scalability and the cost and performance
Fact table contains all the dimensions
Dimensions are the attributes
How does Polybase useful for data warheousing in Azure?
Polybase allows a data warehouse to be used as an ELT. You load the data into the warehouse before you transform it.
What are the three types of shading used in a data warehouse?
Hash, Round Robin, Replicate
What’s the difference between hash, round robin and replicate
Hash has the highest query performance for large tables
Round Robin is the easiest to create for staging tables
Replicate - fastest
Is Azure SQL Database vertical or horizontal scaling?
It is vertical scaling
What are the three options for Azure SQL Database?
Single Database, elastic pool or managed instance
When do you use a single database in Azure SQL Database?
You use Single database when you have one set of resources.
When do you use elastic pool in Azure SQL Database?
Elastic pool is a group of databases that share resources.
What is a managed instance of Azure SQL Database?
A managed instance is similar to the sharing of elastic pool resources but has functionality similar to on-prem
What are the two pricing models for Azure SQL Database?
You can buy vCore or DTU
What’s the difference between DTUs and vCores?
DTUs are pre-configured solutions. If you have more than 300 DTUs it is better to switch to vCores. vCores is better for high cost databases or you need to monitor and trim costs. DTUs are best when running inexpensive databases.
How does DTUs and vCores diffe in terms of pricing?
With DTUs you pay one fixed price for your compute as well as data storage and backup retention.
What is hyper scale in Azure SQL Database?
Hyperscale available with vCores, allows databases to grow to 100 times their size.
Can you use vCores or DTUs for elastic pools in Azure SQL Database?
Yes, you can use both
When should you use SQL Database Managed Instance?
It’s ideal for migrating from on-prem to Azure Cloud. It has native vent integration. It has automated backups but it is more expensive than elastic pool.
Explain Azure SQL Database Backups
You can automate backups via read-access geo-redundant storage.
Backups are done full weekly, differential every 12 hours and transactions every 5-10 minutes
Backups are good for restoring deleted tables. Restoring to a different region, and restoring to a point in time.
Log term retention is for longer term storage as blob.