DP-203 Vocab Flashcards
What is Structured Data?
Structured data is data that adheres to a schema, so all of the data has the same fields or properties. Structured data can be stored in a database table with rows and columns.
What is Semi-Structured Data?
Semi-Structured data doesn’t fit into tables, rows and columns. Instead, semi-structured data uses tags_ or key that organized and provide a hierarch for the data.
What is Unstructured Data?
Unstructured data encompasses data that has no designated structure to it. Known as No-SQL. There are four types of structured No-SQL databases:
*Key Value Store
*Document Database
*Graph Databases
*Column Base
What service should you use for data?
- When you need a low cost high, throughput data store.
- When you need to store No-SQL data.
- When you do not need to query the data directly. No ad hoc query support.
- Suits the Storage of archive or relatively static data.
- Suits as a HDInsight Hadoop data store.
Blob
What service should you use for data:
- When you need a low cost, high throughput data store.
- Unlimited storage for no SQL data
- When you do not need to query the data directly. No ad hoc query support.
- Suits the storage of archive or relatively static data.
- Suits acting as a data bricks, HDInsight, and IoT data store.*
Data lake
What service should you use for data?
- Eases the deployment of a Spark based cluster.
- Enables the fastest processing of Machine learning solutions.
- Enables collaboration between data engineers and data scientists.
- Provides tight enterprise security integration with Azure Directory.
- Integration with other Azure Services and Power BI
Data Bricks
What service should you use for data?
- Provides Global distribution for both structured and unstructure data stores
- Millisecound query response time
- 99.999% availabilty of data
- Worldwide elastic scale of both the storage and throughput
- Multiple consistency levels to control data integreity with concurency
Cosmos-DB
What service should you use for data:
- When you require a relational data store
- When you need to manage transactional workloads
- When you need to manage a high volume on inserts and reads
- When you need to service that requires high concurrency
- When you required a solution that can scale elastically
SQL DB (Azure SQL)
What service should you use for data:
- When you require a relational data store
- When you need to manage analytical workloads
- When you need low cost storage
- When you require the ability to pause and restart the compute
- When you require a soution that can scale elastically
Azure SQL-DW
What is the difference between hot and cold data?
Hot is when the frequent operation is data retrived and Cold is when dat is not accessed often.
Reading data in Azure Databricks: Translate from SQL
Select col_1 from myTable
df.select(col(“col_1”)
Reading data in Azure Databricks: Translate from SQL
Select * from mytable where col_1 > 0
df.filter(col(“col_1)>0)
Request Units in Cosmos-DB
What is Database Throughput?
Database througput is the number of reads and writes that your database can perform in a single second.
Request Units in Cosmos-DB
What is a request unit?
Azure Cosmos DB measures throughput using something called a request (RU). Request unit usage is measured per second, so the unit of measure is request unit per second. You must reserve the number of RU/s you want Azure Cosmos DB to provision in advance.
Request Units in Cosmos-DB
What occurs when you exceed throughput limits?
If you don’t reserve enough request units, and you attempt to read or write more data that your provisioned throughut allows. Your request will be rate-limited.
Cosmos DB Failover management
What is automated fail-over?
Automated fail-over is a feaature that comes into play when there’s a disaster or other event that that takes one of your read or write regions offline, and it redirects requests from the offline region to the next most prioritized region.
Cosmos-DB
What is the Latency I will have to use in order to provide the lower latency of reads and writes?
Eventual Consistency
True or False?
Cosmos-DB takes care of consistency of data when replicated?
True!
Azure SQL DB Configuration
What is SQL Elastic Pools and the benefits?
SQL Elastic pools are a simple, cost-effective solution for managing and scaling multiple databases that have varying and unpredictable usage demands. The databases in an elastic pool are on a single server and share a set number of resources at a set price.
Benfits:
* Relate to eDTUs (elastic Data Throughput Unit)
* Enables you to buy set of compute and storage resources that are shared a moung all the databases in the pool.
Azure SQL-DW
What are the 3 Types of Azure SQL-DW?
- Enterprise DW-Centralized data store that provides analytics and decision support.
- Data Marts - Disigned for the needs of a single team or buisness unit such as sales.
- Operational Data Stores - Used as interim store to integrate real-time from multple souces for additional operations on the data.
Azure SQL-DW
What are the two architectural ways of building a DW?
- Bottom-up Architecture
Approach based on notions of connected data marts
Depends on star Schema
Benefit - Start departmental Data Mart - Top-Down Architecture
Creating one single integrated Normalized Warehouse
Internal Relational contructs follow the rules of normalization
Stream Analytics
This function hops forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set.
Stream Analytics
This functions group events that arrive at similar times, filtering out periods of time where there is no data. It has three main parameters: timeout, maximum duration, and partitioning key (optional).