DP-900 Flashcards
What are the types of Data?
Structured Data (Rows and Columns)
Semi-Structured Data (Tags, value tags, No-SQL data)
Unstructured Data (Pictures, Videos, etc.)
What is Relational Data?
Structured Data
Defined Schema
Clear relationship between fields and tables
(SQL Data)
What is Non-relational data?
Semi-Structured Data
No defined schema
Hierarchy is defined by tags and keys
What is OLTP?
Online Transactional Processing
What does ACID stand for?
Atomicity - means that a transaction is “all or nothing.”
Consistency - ensures data integrity and prevents violations of database constraints or rules.
Isolation - allows multiple transactions to occur concurrently without interference.
Durability - ensures that committed transactions persist, even in the face of errors or failures.
What is Azure SQL Database?
A fully managed, highly scalable PaaS database service that is designed for the cloud. This service includes the core database-level capabilities of on-premises SQL Server, and is a good option when you need to create a new application in the cloud.
Use this option for new cloud solutions, or to migrate applications that have minimal instance-level dependencies.
What is Azure SQL Managed Instance?
A platform-as-a-service (PaaS) option that provides near-100% compatibility with on-premises SQL Server instances while abstracting the underlying hardware and operating system. The service includes automated software update management, backups, and other maintenance tasks, reducing the administrative burden of supporting a database server instance.
Use this option for most cloud migration scenarios, particularly when you need minimal changes to existing applications.
What is Azure SQL VM?
A virtual machine running in Azure with an installation of SQL Server. The use of a VM makes this option an infrastructure-as-a-service (IaaS) solution that virtualizes hardware infrastructure for compute, storage, and networking in Azure; making it a great option for “lift and shift” migration of existing on-premises SQL Server installations to the cloud.
Use this option when you need to migrate or extend an on-premises SQL Server solution and retain full control over all aspects of server and database configuration.
What is Azure Database for MySQL?
a simple-to-use open-source database management system that is commonly used in Linux, Apache, MySQL, and PHP (LAMP) stack apps.
What is Azure Database for MariaDB?
a newer database management system, created by the original developers of MySQL. The database engine has since been rewritten and optimized to improve performance. MariaDB offers compatibility with Oracle Database (another popular commercial database management system).
What is Azure Database for PostgreSQL?
a hybrid relational-object database. You can store data in relational tables, but a PostgreSQL database also enables you to store custom data types, with their own non-relational properties.
Most popular database for modern apps
What is Azure Cosmos DB?
Azure Cosmos DB is a global-scale non-relational (NoSQL) database system that supports multiple application programming interfaces (APIs), enabling you to store and manage data as JSON documents, key-value pairs, column-families, and graphs.
What are the 3 Core Storage types within Azure Storage
Blob containers - scalable, cost-effective storage for binary files.
File shares - network file shares such as you typically find in corporate networks.
Tables - key-value storage for applications that need to read and write data values quickly. Semi Structured
What is Azure Data Factory?
Azure Data Factory is an Azure service that enables you to define and schedule data pipelines to transfer and transform data.
You can integrate your pipelines with other Azure services, enabling you to ingest data from cloud data stores, process the data using cloud-based compute, and persist the results in another data store.
What is Azure Synapse Analytics?
Azure Synapse Analytics is a comprehensive, unified data analytics solution that provides a single service interface for multiple analytical capabilities, including:
Pipelines - based on the same technology as Azure Data Factory.
SQL - a highly scalable SQL database engine, optimized for data warehouse workloads.
Apache Spark - an open-source distributed data processing system that supports multiple programming languages and APIs, including Java, Scala, Python, and SQL.
Azure Synapse Data Explorer - a high-performance data analytics solution that is optimized for real-time querying of log and telemetry data using Kusto Query Language (KQL).
What is Azure Databricks?
Azure Databricks is an Azure-integrated version of the popular Databricks platform, which combines the Apache Spark data processing platform with SQL database semantics and an integrated management interface to enable large-scale data analytics.
What is Azure HDInsight?
Azure HDInsight is an Azure service that provides Azure-hosted clusters for popular Apache open-source big data processing technologies, including:
Apache Spark - a distributed data processing system that supports multiple programming languages and APIs, including Java, Scala, Python, and SQL.
Apache Hadoop - a distributed system that uses MapReduce jobs to process large volumes of data efficiently across multiple cluster nodes. MapReduce jobs can be written in Java or abstracted by interfaces such as Apache Hive - a SQL-based API that runs on Hadoop.
Apache HBase - an open-source system for large-scale NoSQL data storage and querying.
Apache Kafka - a message broker for data stream processing.
What is Azure Stream Analytics?
Azure Stream Analytics is a real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.
What is Azure Data Explorer?
Azure Data Explorer is a standalone service that offers the same high-performance querying of log and telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.
What is Azure SQL Edge?
A SQL engine that is optimized for Internet-of-things (IoT) scenarios that need to work with streaming time-series data.
What is Normalization
Is the process of organizing data in a database. Includes creating tables, and establishing relationships between those tables to protect the data and make the database more flexible by eliminating redundancy.
What is Extract, Transform and Load (ETL)
Extract, transform, and load (ETL) is a data integration methodology that extracts raw data from sources, transforms the data on a secondary processing server, and then loads the data into a target database.
ETL is used when data must be transformed to conform to the data regime of a target database.
What is Cognitive analytics?
Uses AI and machine learning to analyze complex data sets and simulate human thought processes.
What is Descriptive analytics?
“What’s Happening”
Analyzes historical data to gain insights into past events and trends.
Used to understand what has happened in the past and identify patterns and trends that can be used to inform future decision-making.