Design & Implement Data Storage Flashcards
What is a Temporal table?
In Azure SQL DB, it allows you to track and analyze the full history of changes to the data, without custom coding.
What is PolyBase?
A SQL Server feature that allows you to join data with external data using T-SQL.
What is Azure Monitor?
A centralized monitoring service for all Azure resources.
What is Cosmos DB?
Fully managed NoSQL DB for app development. Easy to be globally distributed.
What is Azure Stream Analytics?
As opposed to batch processing, for real-time analytics; fast-moving streams of data for reports and triggering alerts. Can lookup against Reference Data.
What are the 5 types of data collected in Azure Monitor?
Application Guest OS Azure Resource Azure Subscription Azure Tenant
What are the 5 modes for Cosmos DB?
Core (SQL) API Cassandra API Gremlin API Table API MongoDB API
What is Azure Synapse Analytics?
A single pane of glass for EDW & Big Data analytics.
What is Azure HDInsight?
Helps to ingest, process and analyze big data. Support batch, data warehousing, IoT and data science.
Hadoop: Includes Hive, HBase, Spark, Kafka.
HBase
Storm
Kafka
What is Apache Spark & Spark Pool?
Spark: Parallel processing framework that support in-memory processing for fast big data analytic apps.
Spark Instances: Created when you connect to a Pool.
Spark Pool: A set of metadata that defines compute resource req’s and associated behavior characteristics when a Spark instance is instantiated.
What is Azure Databricks?
Data & AI service for data engineering.
Databricks SQL
Databricks Data Science & Engineering
Databricks Machine Learning
What is a Lakehouse?
The merging of data warehouse & data lake architectures.
What should you use to provision throughput for a Cosmos DB Container?
A logical key partition.
What are the 5 consistency levels in Cosmos DB from strongest consistency to weakest?
Strong Bounded Staleness Session Consistent Prefix Eventual
What are the 5 consistency levels in Cosmos DB, FROM highest availability, lowest latency and highest throughput
TO lowest availability, highest latency and lowest throughput?
Eventual, Consistent Prefix, Session, Bounded Staleness Strong
What are some attributes of a Cosmos DB logical partition key?
Be a property value that does not change.
Have high cardinality; should have a wide range of values.
Spread Request Units (RUs) evenly across the all logical partitions.
Describe the 5 modes of Cosmos DB.
Core (SQL) API: Document data model. Use for all new Cosmos implementations.
Cassandra API: Columnar data model.
Gremlin API: Graph DB data model.
Table API:
MongoDB API: Use if migrating MongoDB implementation into Cosmos.
What is the best data distribution method for SQL Data Warehouse to improve load speed?
Round Robin (Not Hash, nor replicate.)
Describe the 5 consistency levels of Cosmos DB from strongest consistency to weakest.
Strong:
Linearizability (serving requests concurrently) guarantee. Users are always guaranteed to a read the latest committed write.
Bounded Staleness:
Reads are guaranteed to honor the consistent-prefix guarantee. The reads might lag behind writes by at most “K” versions (that is, “updates”) of an item or by “T” time interval, whichever is reached first.
Session:
Session consistency is the most widely used consistency level for both single region as well as globally distributed applications. It provides write latencies, availability, and read throughput comparable to that of eventual consistency but also provides the consistency guarantees that suit the needs of applications written to operate in the context of a user.
Consistent Prefix:
In consistent prefix option, updates that are returned contain some prefix of all the updates, with no gaps. Consistent prefix consistency level guarantees that reads never see out-of-order writes.
Eventual:
In eventual consistency, there’s no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge.
Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before. Eventual consistency is ideal where the application does not require any ordering guarantees.
What is a NoSQL database and when do I use it?
No Structure Query Language; provides
Horizontal scaling and flexibility.
Use when:
There’s large amounts of data.
The relationship isn’t important.
The data changes over time (dynamic schema).
Not for complex queries.
Examples: MongoDB, Redis, HBase.
Describe the types of NoSQL databases.
Document databases:
Typically JSON; family of information with varying structure. CosmosDB.
Key-value stores:
Highly optimized for simple lookups; scalable across multiple nodes. Cosmos DB Table API; Redis; Table Storage; HBase.
Graph databases:
Nodes, Edges. Cosmos DB Graph (Gremlin) API
Wide column stores: (modeled after Google BigTable)
Organized into columns & rows; column families of data separated from traditional RDBMS; Schemaless; columns and data types can be undefined before using them. HBase in HDInsight.
What is the difference between Cosmos DB & SQL Data Warehouse (now SQL Pools)?
Global Replication and Speed (Cosmos DB)
vs
Consistency and Integrity (SQL Pools)
Which SQL Data Warehouse distribution method is the most appropriate for queries on large tables?
Hash is best for improving query performance on large tables. Round would be best for improving load speed.
Name 3 benefits of Cosmos DB.
Global Replication
Multi-Model
Elastic