Definitions Flashcards

1
Q

Describe Azure Storage Accounts

A
  • When you need a low cost, high throughput data store.
  • When you need to store No-SQL data.
  • When you do not need to query the data directly. No ad hoc query support.
  • Suits the storage of archive or relatively static data.
  • Suits acting as a HDInsight Hadoop data store.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Describe Data Lake Store

A
  • When you need a low cost, high throughput data store.
  • Unlimited storage for No-SQL data.
  • When you do not need to query data directly. No ad hoc query support.
  • Suits the storage of archive or relatively static data.
  • Suits acting as a Databricks, HDInsight, and IoT data store
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Describe Azure Databricks.

A
  • Eases the deployment of a Spark based cluster.
  • Enables the fastest processing of ML solutions.
  • Enables collaboration between data engineers and data scientists.
  • Provides tight enterprise security integration with Azure Active Directory.
  • Integration with other Azure Services and Power BI.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Describe Azure Cosmos DB (Premium)

A
  • Provides global distribution for both structured and unstructured data stores.
  • Millisecond query response time.
  • 99.999% availability of data.
  • Worldwide elastic scale of both the storage and throughput.
  • Multiple consistency levels to control data integrity with concurrency.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Describe Azure SQL Database.

A
  • When you require a relational data store.
  • When you need to manage transactional workloads.
  • When you need to manage a high volume on inserts and reads.
  • When you need a service that requires high concurrency.
  • When you require a solution that can scale elastically.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe Azure Data Warehouse.

A
  • When you require a relational data store.
  • When you need to manage analytical workloads.
  • When you need low cost storage.
  • When you require the ability to pause and restart the compute.
  • When you require a solution that can scale elastically.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe Azure Stream Analytics.

A
  • When you require a fully managed event processing engine (utilizes Azure Event Hub)
  • When you require temporal analysis of streaming data.
  • Support for analyzing IoT streaming data.
  • Support for analyzing application data through Event Hubs.
  • Ease of use with Stream Analytics Query Language.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Describe Azure Data Factory.

A
  • When you want to orchestrate the batch movement of data (pipelines).
  • When you want to connect to a wide range of data platforms.
  • When you want to transform or enrich the data in movement.
  • When you want to integrate with SSIS packages.
  • Enables verbose logging of data processing activities.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe Azure HDInsight (Hadoop, Open)

A
  • When you need a low cost, high throughput data store.
  • When you need to store No-SQL data.
  • Suits acting as a Hadoop, Hbase, LLAP, or Kafka data store.
  • Eases the deployment and management of clusters.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Describe Aure Data Catalog.

A
  • When you require documentation of your data stores.
  • When you require a multi user approach to documentation.
  • When you need to annotate data sources with descriptive metadata.
  • A fully managed cloud service whose users can discover the data sources.
  • When you require a solution that can help business users understand their data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Describe Azure Queues Storage

A
  • Azure Queue Storage is a service for storing large numbers of messages.
  • You access messages from anywhere in the world via authenticated calls using HTTP or HTTPS.
  • A queue message can be up to 64 KB in size.
  • A queue may contain millions of messages, up to the total capacity limit of a storage account. - - - Queues are commonly used to create a backlog of work to process asynchronously.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Describe Azure Tables.

A
  • NoSQL key-value Storage
  • Items are referred to as rows, fields are known as columns
  • All rows in a table must have a key.
  • No Concept of relationships
  • Data will usually be denormalized
  • Used for logging and performance monitoring
  • Storing TBs of structured data, capable of serving web scale apps
  • Datasets that do not require complex joins, foreign keys, or stored procedures
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Describe Azure File Storage.

A
  • Enables to create file share in the cloud (policy documents, etc.)
  • Accessible by Windows, Linux, macOS
  • Accessible SMB protocol or Network File System (NFS) protocol
  • Ensure data is encrypted at rest, Server Message Block (SMB) protocol ensures data is encrypted in transit.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Describe Azure Disk Storage.

A
  • VM uses disks to store OS, apps, data
  • one VM can have on OS disk, and multiple Data disk, but one data disk can only be lined with one VM
  • Both OS disk and data disk are virtual hard disks (VHDs)
  • Unmanaged disk: create storage account, specify it when we create the disk. Not recommended.
  • Managed disk: Azure creates and manages storage accounts (scalable, resiliency)
  • Standard HDD/SSD, Premium SSD, Ultra SSD
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Describe CosmosDB

A
  • Serverless architecture, DaaS, no OPEX, no schema or index management, 5x 9s availability
  • Multimodel (JSON, table graph, columnar), multi-language (Java, .NET, Python, Node.js, Javascript).
  • Globally distributed, multi-model database, mission critical applications***
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Describe Azure Data Lake Gen2.

A
  • Very big container to store data.
  • No limit to Data Lake storage.
  • Stores structured, unstructured, batches, sensor data.
  • Takes advantage of Blob storage and Hadoop together.
  • Optimized for big data analytics.
  • Supports multiple Azure integrations.
17
Q

Describe Azure Blob Storage.

A
  • Large object storage in the cloud.
  • Optimized for storing mass amounts of unstructured data
  • General purpose, cost efficient
18
Q

Cosmos API SQL (Core)

A
  • New projects being created from scratch.
  • JSON Documents.
  • Supports server side programming model.
19
Q

Cosmos MongoDB API

A
  • BSON Documents
  • Fully compatible with Mongo DB application code
  • Migrate existing Cosmos DB without much change of logic.
20
Q

Cosmos DB Table API

A
  • NoSQL Db
  • Premium offering for Azure Table Storage
  • Row cannot store object
21
Q

Cosmos DB Cassandra API

A
  • Wide column No SQL Db
  • Name and format of column can vary from row to row.
  • Migrate Cassandra application to Cosmos Cassandra API to change connection string.
22
Q

Cosmos Db Gremlin API

A
  • Graph data model!
  • Real world data connected with each other
  • Graph database can persist relationships in the storage layer
  • No schema, no dependencies, relationships exist naturally, demonstrate how real-world objects are related
  • Geospatial, Social networks, Recommendation engines, IoTs
23
Q

Define Partition Key.

A

It is the value by which Azure organizes your data into logical divisions.

24
Q

Define Logical Partitions.

A

They are formed based on the value of a partition key that is associated with each item in a container. (digital separation)

25
Q

Define Physical Partition.

A

Internally, one or MORE logical partitions are mapped to a single physical partition.

26
Q

Define Hot Partitions.

A

Hot partition storage: are when you have skewed a logical partition to be overloaded with data.

Hot partition throughput: you load the most accessed data into a logical partition that receives a skewed amount of query results.

27
Q

Single Partition vs Cross-Partition Query

A
  • Single Partition Query: from query dive directly into logical partition containing data about that subject.
  • Cross Partition Query: Searching across multiple logical containers for referenced data (i.e. favorite color == blue) where the query must visit each logical partition to gather, then compile data in a returned result (less desirable but sometimes unavoidable).
28
Q

Manual Failover v. Automatic Failover

A
  • Manual Failover: promote a read-only region to write-only failover region for customer data writing.
  • Automatic Failover: Can be declared ahead of any issues. Automatic handles reads to nearest dataset in the event of a failure.
29
Q

Data Consistency Definition.

A

The state of data in which all copies or instances are the same across all systems and databases. Ensures data is accurate, up to date, and coherent across all database systems, applications, platforms (banking applications).

30
Q

Default Consistency Options. (CosmosDB)

A
  1. Strong - All data centers update at the exact same time, very strong consistency. Highest cost.
  2. Bounded Staleness - Define a lag for accessing updated data in a different region.
  3. Session - Within the session, you will have Strong type of updates, but outside of a session, you will not read the most updated data.
  4. Consistent Prefix - Order and sequence maintained, however, no guarantee that the data is the most updated.
  5. Eventual - No guarantee that you are reading the most updated data or in sequence (music notes). Lowest cost.