DP-203 Vocab Flashcards

1
Q

What is Structured Data?

A

Structured data is data that adheres to a schema, so all of the data has the same fields or properties. Structured data can be stored in a database table with rows and columns.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Semi-Structured Data?

A

Semi-Structured data doesn’t fit into tables, rows and columns. Instead, semi-structured data uses tags_ or key that organized and provide a hierarch for the data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Unstructured Data?

A

Unstructured data encompasses data that has no designated structure to it. Known as No-SQL. There are four types of structured No-SQL databases:

*Key Value Store
*Document Database
*Graph Databases
*Column Base

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What service should you use for data?

  • When you need a low cost high, throughput data store.
  • When you need to store No-SQL data.
  • When you do not need to query the data directly. No ad hoc query support.
  • Suits the Storage of archive or relatively static data.
  • Suits as a HDInsight Hadoop data store.
A

Blob

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What service should you use for data:

  • When you need a low cost, high throughput data store.
  • Unlimited storage for no SQL data
  • When you do not need to query the data directly. No ad hoc query support.
  • Suits the storage of archive or relatively static data.
  • Suits acting as a data bricks, HDInsight, and IoT data store.*
A

Data lake

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What service should you use for data?

  • Eases the deployment of a Spark based cluster.
  • Enables the fastest processing of Machine learning solutions.
  • Enables collaboration between data engineers and data scientists.
  • Provides tight enterprise security integration with Azure Directory.
  • Integration with other Azure Services and Power BI
A

Data Bricks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What service should you use for data?

  • Provides Global distribution for both structured and unstructure data stores
  • Millisecound query response time
  • 99.999% availabilty of data
  • Worldwide elastic scale of both the storage and throughput
  • Multiple consistency levels to control data integreity with concurency
A

Cosmos-DB

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What service should you use for data:

  • When you require a relational data store
  • When you need to manage transactional workloads
  • When you need to manage a high volume on inserts and reads
  • When you need to service that requires high concurrency
  • When you required a solution that can scale elastically
A

SQL DB (Azure SQL)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What service should you use for data:

  • When you require a relational data store
  • When you need to manage analytical workloads
  • When you need low cost storage
  • When you require the ability to pause and restart the compute
  • When you require a soution that can scale elastically
A

Azure SQL-DW

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the difference between hot and cold data?

A

Hot is when the frequent operation is data retrived and Cold is when dat is not accessed often.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Reading data in Azure Databricks: Translate from SQL

Select col_1 from myTable

A

df.select(col(“col_1”)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Reading data in Azure Databricks: Translate from SQL

Select * from mytable where col_1 > 0

A

df.filter(col(“col_1)>0)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Request Units in Cosmos-DB

What is Database Throughput?

A

Database througput is the number of reads and writes that your database can perform in a single second.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Request Units in Cosmos-DB

What is a request unit?

A

Azure Cosmos DB measures throughput using something called a request (RU). Request unit usage is measured per second, so the unit of measure is request unit per second. You must reserve the number of RU/s you want Azure Cosmos DB to provision in advance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Request Units in Cosmos-DB

What occurs when you exceed throughput limits?

A

If you don’t reserve enough request units, and you attempt to read or write more data that your provisioned throughut allows. Your request will be rate-limited.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Cosmos DB Failover management

What is automated fail-over?

A

Automated fail-over is a feaature that comes into play when there’s a disaster or other event that that takes one of your read or write regions offline, and it redirects requests from the offline region to the next most prioritized region.

17
Q

Cosmos-DB

What is the Latency I will have to use in order to provide the lower latency of reads and writes?

A

Eventual Consistency

18
Q

True or False?

Cosmos-DB takes care of consistency of data when replicated?

A

True!

19
Q

Azure SQL DB Configuration

What is SQL Elastic Pools and the benefits?

A

SQL Elastic pools are a simple, cost-effective solution for managing and scaling multiple databases that have varying and unpredictable usage demands. The databases in an elastic pool are on a single server and share a set number of resources at a set price.

Benfits:
* Relate to eDTUs (elastic Data Throughput Unit)
* Enables you to buy set of compute and storage resources that are shared a moung all the databases in the pool.

20
Q

Azure SQL-DW

What are the 3 Types of Azure SQL-DW?

A
  • Enterprise DW-Centralized data store that provides analytics and decision support.
  • Data Marts - Disigned for the needs of a single team or buisness unit such as sales.
  • Operational Data Stores - Used as interim store to integrate real-time from multple souces for additional operations on the data.
21
Q

Azure SQL-DW

What are the two architectural ways of building a DW?

A
  • Bottom-up Architecture
    Approach based on notions of connected data marts
    Depends on star Schema
    Benefit - Start departmental Data Mart
  • Top-Down Architecture
    Creating one single integrated Normalized Warehouse
    Internal Relational contructs follow the rules of normalization
22
Q

Stream Analytics

This function hops forward in time by a fixed period. It may be easy to think of them as Tumbling windows that can overlap, so events can belong to more than one Hopping window result set.

23
Q

Stream Analytics

This functions group events that arrive at similar times, filtering out periods of time where there is no data. It has three main parameters: timeout, maximum duration, and partitioning key (optional).