db-900 core data concepts Flashcards

db-900 azure data fundamentals

1
Q

What are the three ways you can categorize data?

A

Structured
Semi–structured
Unstructured

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is tabular data?

A

data that is stored as rows and columns, in one or more ‘table’.
A row represents an entity and a column represents an attribute of that entity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What makes data ‘structured’?

A

it is tabular and adheres to a fixed schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What makes data ‘semi–structured’?

A

it contains entities which have some regularly occuring attributes but there is variation. Sometimes those attributes are missing or there are multiple values for a givent attribute, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is an example of a format that is useful for ‘semi structured’ data?

A

JSON – because it allows you to define fields for an entity but does not need to adhere to a predefined schema.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are some examples of ‘unstructured’ data?

A

audio, video, and images

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the two broad categories of data stores?

A

File stores and Databases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are some common ways to store files?

A

BLOB, CSV, XML, JSON, and optimized file formats like: Avro, ORC, and Parquet

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is XML?

A

a human readable semistructured format that stores data in tags.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

what is replacing XML?

A

JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the best format for storing large objects like videos, audio, and images?

A

BLOB
Binary Large Object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is file storage different from a database?

A

The difference is that one deals with records rather than files

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is NoSQL?

A

databases that are not relational

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the 4 common types of non–relational databases?

A

Key–value
Document
Column Family
Graph

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

In a key–value database, what format does the value have to be in?

A

In this type of database, it doesn’t matter what the format of the value is. It can be numerical, text, etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

In a document database, what format does the value have to be in?

A

JSON

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are the two types of data processing?

A

Transactional and Analytical

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is OLTP?

A

Online Transaction Processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does OLTP track?

A

Transactions, which are often CRUD operations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does a transaction ensure?

A

ACIDity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What does ACID stand for?

A

Atomicity
Consistency
Isolation
Durability

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is atomicity?

A

All sub–components of a transaction must succeed in order for the transaction to take place. It is binary, either all of it completes or none of it does.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

How do you know a transaction is consistent?

A

When a transaction takes the database from one valid state to another valid state. If you were to transfer funds from one account to another, the total number of funds remains the same, because it is subtracted from one and added to another

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How do you know your transactions are ‘isolated’?

A

When the transaction does not interfere with another transaction. If I run a transaction to transfer funds from one account to another, and I also run a transaction to get the number of funds from all accounts, that second transaction should not get one account total before the transfer and one account total after the transfer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
What proves that a transaction was durable?
Once the transaction is committed, it persists. If the database it turned on and off, the change remains.
26
What is a data lake?
It is used to process large volumes of file based data
27
What is an olap model?
Online Analytical Processing model
28
ETL takes the data from where to where?
From operational data to the data lake, warehouse, or lakehouse
29
What is a data warehouse?
A database optimized for analytics queries (read operations)
30
What does CRUD stand for?
Create Retrieve Update Delete
31
What is a lakehouse
combines the flexible and scalable storage with relational querying semantics
32
What kind of denormalization takes place when oltp data is transferred to a lakehouse?
Relational Data will contain duplicate data across rows.
33
What are the 3 main roles in Data?
Database Administrator Data Engineer Data Analyst
34
What does a database administrator do?
They are responsible for the design, implementation, and maintenance of databases. They do things like update the databases and manage permissions, and they are responsible for the performance and reliability of the databases.
35
What does a data engineer do?
They are responsible for building data workloads for databases and file stores that take transactional data and make them available for analytics. They own the management and monitoring of data pipelines to ensure that data loads perform as expected.
36
What does a data analyst do?
They investigate and transform data into reports and visualizations to provide insights for valuable business questions
37
What is Azure SQL?
A group of relational database solutions built on the SQL Server engine
38
What is Azure?
Azure is a collection of cloud based IT solutions
39
What is Azure SQL Database?
Fully managed Platform-as-a-service product, which provides the least flexible configuration
40
What is Azure SQL Managed Instance?
A hosted instance of SQL Server which provides automated maintenance, allowing more configuration flexibility than SQL DB
41
What is Azure SQL VM?
A virtual machine with SQL Server installed, allowing for the maximum amount of configuration, also the most aount of responsibility for the DBA
42
What Azure products are offered for open-source relational databases?
MySQL MariaDB Postgres
43
What is Azure Cosmos?
A global scale non-relational (noSQL) database which supports storing documents as JSONs, key-value pairs, column family tables, and graphs. Sometimes DBAs have to manage this, but usually the software engineers do. Often Data Engineers will need to extract data from here for a data lakehouse
44
What is Azure Storage?
A cloud service that allows you to store data in BLOB containers, file shares, and tables
45
What would a data engineer do with Azure Storage?
They would likely use it as a data lake
46
What is Azure Data Factory?
A service to define and schedule data pipelines to transfer and transform data. It can be integrated with other Azure products
47
What would a data engineer do with Azure Data Factory?
They would use it to build ETL pipelines that take operational data and populate data warehouses for analytics solutions
48
What is Azure Synapse Analytics?
Comprehensive PaaS for analytics
49
What does Synapse Analytics include?
Pipelines SQL Apache Spark Synapse Analytics Data Explorer
50
What is Synapse Analytics pipelines?
Same technology as Azure Data Factory
51
What is SQL?
a highly scalable SQL database engine, optimized for data warehouse workloads (read queries)
52
What is Apache Spark?
An open source distributed data processing system that allows for the integration of APIs using python, sql, java, and scala
53
What is Synapse Analytics Data Explorer?
Uses the Kusto Query Language to provide extremely fast analytics processing optimized for realtime telemetry and log data
54
what can data engineers use Azure Synapse Analytics for?
They will use it to build comprehensive data analytics solutions for ingest pipelines, lake storage, and warehouse storage
55
what can data analysts use Azure Synapse Analytics for?
They can use sql and spark through interactive notebooks and integrate with Azure Machine Learning and Power BI to create models
56
What is Azure Databricks?
An Azure integrated version of a popular platform which combines Apache Spark and SQL database semantics for large scale analytics
57
what can data analysts use Azure Databricks for?
They can use the native notebook support to provide browser friendly data analysis
58
what can data engineers use Azure Databricks for?
They'll use it to create analytical data stores
59
What is Azure HDInsight?
This provides Azure hosted clusters for apache technologies
60
What is Apache Hadoop?
Write map-reduce jobs in Java or Apache Hive to process large volumes of data
61
What is Apache HBase?
Query NoSQL data at a large scale with this
62
What is Apache Kafka?
a message broker for data stream processing
63
Data engineers can use Azure HDInsight for what?
They can use this to support big data processing jobs that use multiple Apache technologies
64
What is Azure Stream Analytics?
Captures a stream of data, applies queries/transformations to it, writes the results for analytics or further processing
65
What can data engineers do with Azure Stream Analytics?
They can use this to write ETL pipelines for analytical data stores
66
What is Azure Data Explorer?
query log and telemetry data fast with this standalone version of the Synapse product
67
Data analysts can use Azure Data Explorer for what?
They can easily analyze timestamped log data
68
What is Microsoft Purview?
Enterprise solution for governance and discoverability, helping people find the data they need
69
What is Microsoft Fabric?
SaaS lakehouse platform that includes: ETL lakehouse analytics warehouse analytics data science and machine learning realtime analytics data visualization data governance and management
70
Data engineers can use Microsoft Purview for what?
They will enforce data governance and ensure integrity of data
71
Structured Semi–structured Unstructured
What are the three ways you can categorize data?
72
data that is stored as rows and columns, in one or more 'table'. A row represents an entity and a column represents an attribute of that entity.
What is tabular data?
73
it is tabular and adheres to a fixed schema.
What makes data 'structured'?
74
it contains entities which have some regularly occuring attributes but there is variation. Sometimes those attributes are missing or there are multiple values for a givent attribute, etc
What makes data 'semi–structured'?
75
JSON – because it allows you to define fields for an entity but does not need to adhere to a predefined schema.
What is an example of a format that is useful for 'semi structured' data?
76
audio, video, and images
What are some examples of 'unstructured' data?
77
File stores and Databases
What are the two broad categories of data stores?
78
BLOB, CSV, XML, JSON, and optimized file formats like: Avro, ORC, and Parquet
What are some common ways to store files?
79
a human readable semistructured format that stores data in tags.
What is XML?
80
JSON
what is replacing XML?
81
BLOB Binary Large Object
What is the best format for storing large objects like videos, audio, and images?
82
The difference is that one deals with records rather than files
How is file storage different from a database?
83
databases that are not relational
What is NoSQL?
84
Key–value Document Column Family Graph
What are the 4 common types of non–relational databases?
85
In this type of database, it doesn't matter what the format of the value is. It can be numerical, text, etc
In a key–value database, what format does the value have to be in?
86
JSON
In a document database, what format does the value have to be in?
87
Transactional and Analytical
What are the two types of data processing?
88
Online Transaction Processing
What is OLTP?
89
Transactions, which are often CRUD operations
What does OLTP track?
90
ACIDity
What does a transaction ensure?
91
Atomicity Consistency Isolation Durability
What does ACID stand for?
92
All sub–components of a transaction must succeed in order for the transaction to take place. It is binary, either all of it completes or none of it does.
What is atomicity?
93
When a transaction takes the database from one valid state to another valid state. If you were to transfer funds from one account to another, the total number of funds remains the same, because it is subtracted from one and added to another
How do you know a transaction is consistent?
94
When the transaction does not interfere with another transaction. If I run a transaction to transfer funds from one account to another, and I also run a transaction to get the number of funds from all accounts, that second transaction should not get one account total before the transfer and one account total after the transfer.
How do you know your transactions are 'isolated'?
95
Once the transaction is committed, it persists. If the database it turned on and off, the change remains.
What proves that a transaction was durable?
96
It is used to process large volumes of file based data
What is a data lake?
97
Online Analytical Processing model
What is an olap model?
98
From operational data to the data lake, warehouse, or lakehouse
ETL takes the data from where to where?
99
A database optimized for analytics queries (read operations)
What is a data warehouse?
100
Create Retrieve Update Delete
What does CRUD stand for?
101
combines the flexible and scalable storage with relational querying semantics
What is a lakehouse
102
Relational Data will contain duplicate data across rows.
What kind of denormalization takes place when oltp data is transferred to a lakehouse?
103
Database Administrator Data Engineer Data Analyst
What are the 3 main roles in Data?
104
They are responsible for the design, implementation, and maintenance of databases. They do things like update the databases and manage permissions, and they are responsible for the performance and reliability of the databases.
What does a database administrator do?
105
They are responsible for building data workloads for databases and file stores that take transactional data and make them available for analytics. They own the management and monitoring of data pipelines to ensure that data loads perform as expected.
What does a data engineer do?
106
They investigate and transform data into reports and visualizations to provide insights for valuable business questions
What does a data analyst do?
107
A group of relational database solutions built on the SQL Server engine
What is Azure SQL?
108
Azure is a collection of cloud based IT solutions
What is Azure?
109
Fully managed Platform-as-a-service product, which provides the least flexible configuration
What is Azure SQL Database?
110
A hosted instance of SQL Server which provides automated maintenance, allowing more configuration flexibility than SQL DB
What is Azure SQL Managed Instance?
111
A virtual machine with SQL Server installed, allowing for the maximum amount of configuration, also the most aount of responsibility for the DBA
What is Azure SQL VM?
112
MySQL MariaDB Postgres
What Azure products are offered for open-source relational databases?
113
A global scale non-relational (noSQL) database which supports storing documents as JSONs, key-value pairs, column family tables, and graphs. Sometimes DBAs have to manage this, but usually the software engineers do. Often Data Engineers will need to extract data from here for a data lakehouse
What is Azure Cosmos?
114
A cloud service that allows you to store data in BLOB containers, file shares, and tables
What is Azure Storage?
115
They would likely use it as a data lake
What would a data engineer do with Azure Storage?
116
A service to define and schedule data pipelines to transfer and transform data. It can be integrated with other Azure products
What is Azure Data Factory?
117
They would use it to build ETL pipelines that take operational data and populate data warehouses for analytics solutions
What would a data engineer do with Azure Data Factory?
118
Comprehensive PaaS for analytics
What is Azure Synapse Analytics?
119
Pipelines SQL Apache Spark Synapse Analytics Data Explorer
What does Synapse Analytics include?
120
Same technology as Azure Data Factory
What is Synapse Analytics pipelines?
121
a highly scalable SQL database engine, optimized for data warehouse workloads (read queries)
What is SQL?
122
An open source distributed data processing system that allows for the integration of APIs using python, sql, java, and scala
What is Apache Spark?
123
Uses the Kusto Query Language to provide extremely fast analytics processing optimized for realtime telemetry and log data
What is Synapse Analytics Data Explorer?
124
They will use it to build comprehensive data analytics solutions for ingest pipelines, lake storage, and warehouse storage
what can data engineers use Azure Synapse Analytics for?
125
They can use sql and spark through interactive notebooks and integrate with Azure Machine Learning and Power BI to create models
what can data analysts use Azure Synapse Analytics for?
126
An Azure integrated version of a popular platform which combines Apache Spark and SQL database semantics for large scale analytics
What is Azure Databricks?
127
They can use the native notebook support to provide browser friendly data analysis
what can data analysts use Azure Databricks for?
128
They'll use it to create analytical data stores
what can data engineers use Azure Databricks for?
129
This provides Azure hosted clusters for apache technologies
What is Azure HDInsight?
130
Write map-reduce jobs in Java or Apache Hive to process large volumes of data
What is Apache Hadoop?
131
Query NoSQL data at a large scale with this
What is Apache HBase?
132
a message broker for data stream processing
What is Apache Kafka?
133
They can use this to support big data processing jobs that use multiple Apache technologies
Data engineers can use Azure HDInsight for what?
134
Captures a stream of data, applies queries/transformations to it, writes the results for analytics or further processing
What is Azure Stream Analytics?
135
They can use this to write ETL pipelines for analytical data stores
What can data engineers do with Azure Stream Analytics?
136
query log and telemetry data fast with this standalone version of the Synapse product
What is Azure Data Explorer?
137
They can easily analyze timestamped log data
Data analysts can use Azure Data Explorer for what?
138
Enterprise solution for governance and discoverability, helping people find the data they need
What is Microsoft Purview?
139
SaaS lakehouse platform that includes: ETL lakehouse analytics warehouse analytics data science and machine learning realtime analytics data visualization data governance and management
What is Microsoft Fabric?
140
They will enforce data governance and ensure integrity of data
Data engineers can use Microsoft Purview for what?