DP-203 Vocab Flashcards
What is Structured Data?
Structured data is data that adheres to a schema, so all of the data has the same fields or properties. Structured data can be stored in a database table with rows and columns.
What is Semi-Structured Data?
Semi-Structured data doesn’t fit into tables, rows and columns. Instead, semi-structured data uses tags_ or key that organized and provide a hierarch for the data.
What is Unstructured Data?
Unstructured data encompasses data that has no designated structure to it. Known as No-SQL. There are four types of structured No-SQL databases:
*Key Value Store
*Document Database
*Graph Databases
*Column Base
What service should you use for data?
- When you need a low cost high, throughput data store.
- When you need to store No-SQL data.
- When you do not need to query the data directly. No ad hoc query support.
- Suits the Storage of archive or relatively static data.
- Suits as a HDInsight Hadoop data store.
Blob
What service should you use for data:
- When you need a low cost, high throughput data store.
- Unlimited storage for no SQL data
- When you do not need to query the data directly. No ad hoc query support.
- Suits the storage of archive or relatively static data.
- Suits acting as a data bricks, HDInsight, and IoT data store.*
Data lake
What service should you use for data?
- Eases the deployment of a Spark based cluster.
- Enables the fastest processing of Machine learning solutions.
- Enables collaboration between data engineers and data scientists.
- Provides tight enterprise security integration with Azure Directory.
- Integration with other Azure Services and Power BI
Data Bricks
What service should you use for data?
- Provides Global distribution for both structured and unstructure data stores
- Millisecound query response time
- 99.999% availabilty of data
- Worldwide elastic scale of both the storage and throughput
- Multiple consistency levels to control data integreity with concurency
Cosmos-DB
What service should you use for data:
- When you require a relational data store
- When you need to manage transactional workloads
- When you need to manage a high volume on inserts and reads
- When you need to service that requires high concurrency
- When you required a solution that can scale elastically
SQL DB (Azure SQL)
What service should you use for data:
- When you require a relational data store
- When you need to manage analytical workloads
- When you need low cost storage
- When you require the ability to pause and restart the compute
- When you require a soution that can scale elastically
Azure SQL-DW
What is the difference between hot and cold data?
Hot is when the frequent operation is data retrived and Cold is when dat is not accessed often.
Reading data in Azure Databricks: Translate from SQL
Select col_1 from myTable
df.select(col(“col_1”)
Reading data in Azure Databricks: Translate from SQL
Select * from mytable where col_1 > 0
df.filter(col(“col_1)>0)
Request Units in Cosmos-DB
What is Database Throughput?
Database througput is the number of reads and writes that your database can perform in a single second.
Request Units in Cosmos-DB
What is a request unit?
Azure Cosmos DB measures throughput using something called a request (RU). Request unit usage is measured per second, so the unit of measure is request unit per second. You must reserve the number of RU/s you want Azure Cosmos DB to provision in advance.
Request Units in Cosmos-DB
What occurs when you exceed throughput limits?
If you don’t reserve enough request units, and you attempt to read or write more data that your provisioned throughut allows. Your request will be rate-limited.