Design & Implement Data Storage Flashcards

1
Q

What is a Temporal table?

A

In Azure SQL DB, it allows you to track and analyze the full history of changes to the data, without custom coding.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is PolyBase?

A

A SQL Server feature that allows you to join data with external data using T-SQL.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Azure Monitor?

A

A centralized monitoring service for all Azure resources.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is Cosmos DB?

A

Fully managed NoSQL DB for app development. Easy to be globally distributed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is Azure Stream Analytics?

A

As opposed to batch processing, for real-time analytics; fast-moving streams of data for reports and triggering alerts. Can lookup against Reference Data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the 5 types of data collected in Azure Monitor?

A
Application
Guest OS
Azure Resource
Azure Subscription
Azure Tenant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 5 modes for Cosmos DB?

A
Core (SQL) API
Cassandra API
Gremlin API
Table API
MongoDB API
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is Azure Synapse Analytics?

A

A single pane of glass for EDW & Big Data analytics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Azure HDInsight?

A

Helps to ingest, process and analyze big data. Support batch, data warehousing, IoT and data science.

Hadoop: Includes Hive, HBase, Spark, Kafka.
HBase
Storm
Kafka

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Apache Spark & Spark Pool?

A

Spark: Parallel processing framework that support in-memory processing for fast big data analytic apps.

Spark Instances: Created when you connect to a Pool.

Spark Pool: A set of metadata that defines compute resource req’s and associated behavior characteristics when a Spark instance is instantiated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Azure Databricks?

A

Data & AI service for data engineering.
Databricks SQL
Databricks Data Science & Engineering
Databricks Machine Learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a Lakehouse?

A

The merging of data warehouse & data lake architectures.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What should you use to provision throughput for a Cosmos DB Container?

A

A logical key partition.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 5 consistency levels in Cosmos DB from strongest consistency to weakest?

A
Strong
Bounded Staleness
Session
Consistent Prefix
Eventual
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the 5 consistency levels in Cosmos DB, FROM highest availability, lowest latency and highest throughput
TO lowest availability, highest latency and lowest throughput?

A
Eventual,
Consistent Prefix,
Session,
Bounded Staleness
Strong
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are some attributes of a Cosmos DB logical partition key?

A

Be a property value that does not change.

Have high cardinality; should have a wide range of values.

Spread Request Units (RUs) evenly across the all logical partitions.

17
Q

Describe the 5 modes of Cosmos DB.

A

Core (SQL) API: Document data model. Use for all new Cosmos implementations.
Cassandra API: Columnar data model.
Gremlin API: Graph DB data model.
Table API:
MongoDB API: Use if migrating MongoDB implementation into Cosmos.

18
Q

What is the best data distribution method for SQL Data Warehouse to improve load speed?

A

Round Robin (Not Hash, nor replicate.)

19
Q

Describe the 5 consistency levels of Cosmos DB from strongest consistency to weakest.

A

Strong:
Linearizability (serving requests concurrently) guarantee. Users are always guaranteed to a read the latest committed write.

Bounded Staleness:
Reads are guaranteed to honor the consistent-prefix guarantee. The reads might lag behind writes by at most “K” versions (that is, “updates”) of an item or by “T” time interval, whichever is reached first.

Session:
Session consistency is the most widely used consistency level for both single region as well as globally distributed applications. It provides write latencies, availability, and read throughput comparable to that of eventual consistency but also provides the consistency guarantees that suit the needs of applications written to operate in the context of a user.

Consistent Prefix:
In consistent prefix option, updates that are returned contain some prefix of all the updates, with no gaps. Consistent prefix consistency level guarantees that reads never see out-of-order writes.

Eventual:
In eventual consistency, there’s no ordering guarantee for reads. In the absence of any further writes, the replicas eventually converge.
Eventual consistency is the weakest form of consistency because a client may read the values that are older than the ones it had read before. Eventual consistency is ideal where the application does not require any ordering guarantees.

20
Q

What is a NoSQL database and when do I use it?

A

No Structure Query Language; provides
Horizontal scaling and flexibility.

Use when:
There’s large amounts of data.
The relationship isn’t important.
The data changes over time (dynamic schema).

Not for complex queries.

Examples: MongoDB, Redis, HBase.

21
Q

Describe the types of NoSQL databases.

A

Document databases:
Typically JSON; family of information with varying structure. CosmosDB.

Key-value stores:
Highly optimized for simple lookups; scalable across multiple nodes. Cosmos DB Table API; Redis; Table Storage; HBase.

Graph databases:
Nodes, Edges. Cosmos DB Graph (Gremlin) API

Wide column stores: (modeled after Google BigTable)
Organized into columns & rows; column families of data separated from traditional RDBMS; Schemaless; columns and data types can be undefined before using them. HBase in HDInsight.

22
Q

What is the difference between Cosmos DB & SQL Data Warehouse (now SQL Pools)?

A

Global Replication and Speed (Cosmos DB)
vs
Consistency and Integrity (SQL Pools)

23
Q

Which SQL Data Warehouse distribution method is the most appropriate for queries on large tables?

A

Hash is best for improving query performance on large tables. Round would be best for improving load speed.

24
Q

Name 3 benefits of Cosmos DB.

A

Global Replication
Multi-Model
Elastic

25
Q

What does PolyBase allow Synapse to do?

A

Extract data from the source system (using T-SQL), load it into the data warehouse, and then transform it as needed.

26
Q

What is the main difference between a Blob Store & a Data Lake?

A

Hierarchical Namespace.

27
Q

What NoSQL database APIs would you use both the key-value and wide-column database types?

A

Table (Key-Value

Casssandra (wide-column)

28
Q

You are configuring a dynamic data mask and need to completely mask the data. What is the most appropriate logic to utilize?

A

Default. A default data mask completely masks the data field it is applied to.

29
Q

What is RPO & RTO?

A

RPO: Recovery Point Objective
What is the acceptable gap of data lost when restoring backups?

RTO: Recovery Time Objective
What is the maximum amount of time that can elapse before the system is brought back online?