Azure Data Fundamentals Flashcards
Which type of data structure allows you to store data in a two-column format without requiring a complex database management system?
key/value store
What is the best DBMS for create, read, update, and delete (CRUD) operations and uses the least amount of storage space
relational database
What DBMS uses unstructured data such as JSON, and optimized for retrieval
document database
What type of DBMS is used to store hierarchical data, such as organizational charts that have nodes and edges.
graph database
What type of databases are commonly used to store and query structured data.
Relational databases
An entity is a
table
A table should have what kind of keys
Primary and foreign
This reduces storage space a data duplication in relational databases
Normalization
These databases in which each record consists of a unique key and an associated value, which can be in any format.
Key-value databases
These databases, which are a specific form of key-value database in which the value is a JSON document (which the system is optimized to parse and query)
Document databases,
These databases, which store tabular data comprising rows and columns, but you can divide the columns into groups known as column-families. Each column family holds a set of columns that are logically related together.
Column family databases
These databases store entities as nodes with links to define relationships between them.
Graph databases
These databases organize data as a series of two-dimensional tables with rows and columns.
Relational databases
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
RDMS
What Azure Services support relational dbs
Azure SQL Database
Azure Database for MySQL
Azure Database for PostgreSQL
Azure Database for MariaDB
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
RDMS
What data store model requires Data to be highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where Data is highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
RDMS
Where Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations. .
key/value store
What data store model Where modify a value (either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation
key/value store
Where An application can store arbitrary data as a set of values. Any schema information must be provided by the application. The key/value store simply retrieves or stores the value by key.
key/value store
What service support key/value pairs
Azure Cosmos DB for Table and Azure Cosmos DB for NoSQL
Azure Cache for Redis
Azure Table Storage
What data store model should you select where the workload Data is accessed using a single key, like a dictionary.
No joins, lock, or unions are required.
No aggregation mechanisms are used.
Secondary indexes are generally not used.
Key value pair
What data store model would be suited for Data caching, Session management, User preference and profile management, Product recommendation and ad serving
Key value pair
What data store model would be suited for workloads where insert and update operations are common.
No object-relational impedance mismatch. Documents can better match the object structures used in application code.
Individual documents are retrieved and written as a single block.
Data requires index on multiple fields.
Document databases
What data store model would be suited for workloads where datatypes can be managed in de-normalized way.
Size of individual document data is relatively small.
Each document type can use its own schema.
Documents can include optional fields.
Document data is semi-structured, meaning that data types of each field are not strictly defined.
Document databases
What data store model would be suited for Product catalog
Content management
Inventory management
Document databases
What Azure services support Document databases
Azure Cosmos DB for NoSQL
What data store model stores a collection of documents, where each document consists of named fields and data. Documents are retrieved by unique keys.
Document databases
This data store model stores two types of information, nodes and edges.
Graph databases
What services support Graph databases?
Azure Cosmos DB for Apache Gremlin
SQL Server
What data store model would be suited for workloads where Complex relationships between data items involving many hops between related data items.
The relationship between data items are dynamic and change over time.
Relationships between objects are first-class citizens, without requiring foreign-keys and joins to traverse.
Graph databases
What data store model stores Nodes and relationships.
Graph databases
These are similar to table rows or JSON documents.
in graph databases
Nodes
These are just as important as nodes, and are exposed directly in the query language.
Relationships
These specify relationships between nodes in graph databases
Edges
This data store model is ideal for Organization charts, Social graphs, Fraud detection, Recommendation engines
Graph databases
These provide massively parallel solutions for ingesting, storing, and analyzing data. The data is distributed across multiple servers to maximize scalability.
Data analytics stores
These data stores use large data file formats such as delimiter files (CSV), parquet, and ORC
Data Analytics
What azure services use data analytics data stores models
Azure Synapse Analytics
Azure Data Lake
Azure Data Explorer
Azure Analysis Services
HDInsight
Azure Databricks
What data store model would be suited for workloads where Data analytics
Enterprise BI
Data analytics
Which type of database can be used for semi-structured data that will be processed by an Apache Spark pool in Azure Synapse Analytics?
Column-family databases
This data base store model organizes data into rows and columns and uses a denormalized approach to structuring sparse data.
Column-family databases
What azure services provide Column-family databases
Azure Cosmos DB for Apache Cassandra
HBase in HDInsight
What data store model perform write operations extremely quickly,
Update and delete operations are rare and is
designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.
column-family
Is an example of a non relational data store
What data store model uses tables consisting of a key column and one or more column families.
column-family
What data store model is ideal for Recommendations
Personalization
Sensor data
Telemetry
Messaging
Social media analytics
Web analytics
Activity monitoring
Weather and other time-series data
column-family
What data store model is a set of values organized by time and typically collect large amounts of data in real time from a large number of sources.
Time series databases
What data store model where Updates are rare, and deletes are often done as bulk operations. The records written are generally small, there are often a large number of records, and total data size can grow rapidly.
Time series databases
What azure services provide a time series database
Azure Time Series Insights
What data store model is ideal for Monitoring and event telemetry.
Sensor or other IoT data.
Time series databases
What azure services are commonly used for data object storage
Blob Storage
Azure Data Lake Storage Gen2
This database does not use the tabular schema of rows and columns found in most traditional database systems.
non-relational database
This type of database is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.
non-relational database
does not use the tabular schema of rows and columns
Document data stores, Columnar data stores, Key/value data stores,
Graph data stores, Time series data stores, Object data stores are all examples of what?
Non-relational data and NoSQL
Which Azure data service allows you to store document, graph, and column-family databases?
Azure Cosmos DB
True or false: Azure SQL Database cannot handle column-family databases
True
This is a fully managed platform-as-a-service (PaaS) database hosted in Azure
Azure SQL Database
This is a hosted instance of SQL Server with automated maintenance, which allows more flexible configuration than Azure SQL DB but with more administrative responsibility for the owner.
Azure SQL Managed Instance
This is a virtual machine with an installation of SQL Server, allowing maximum configurability with full management responsibility.
Azure SQL VM
This role typically provision and manage Azure SQL database systems to support line of business (LOB) applications that need to store transactional data.
Database administrators
This role may use Azure SQL database systems as sources for data pipelines that perform extract, transform, and load (ETL) operations to ingest the transactional data into an analytical system.
Data engineers
This role may query Azure SQL databases directly to create reports, though in large organizations the data is generally combined with data from other sources in an analytical data store to support enterprise analytics.
Data analysts
What open source relational databases does azure provide
MySQL
MariaDB
PostgreSQL
A simple-to-use open-source database management system that is commonly used in Linux, Apache, MySQL, and PHP (LAMP) stack apps.
Azure Database for MySQL
Open source that offers compatibility with Oracle
Azure Database for MariaDB
A hybrid relational-object database. You can store data in relational tables, and custom data types, with their own non-relational properties.
Azure Database for PostgreSQL
A global-scale non-relational (NoSQL) database system that supports multiple application programming interfaces (APIs), enabling you to store and manage data as JSON documents, key-value pairs, column-families, and graphs.
Azure Cosmos DB
WhatAzure service provides key/attribute storage for applications that need to read and write data values quickly at a low cost
Azure tables
You can store any number of entities in a table,
network file shares such as you typically find in corporate networks
AzureFiles
scalable, cost-effective storage for binary files.
Blob containers
This azure service enables you to define and schedule data pipelines to transfer and transform data. You can integrate your pipelines with other Azure services, enabling you to ingest data from cloud data stores, process the data using cloud-based compute, and persist the results in another data store.
Azure Data Factory
ThisAzure service is a comprehensive, unified Platform-as-a-Service (PaaS) solution for data analytics that provides a single service interface for multiple analytical capabilities,
Azure Synapse Analytics
Which combines the Apache Spark data processing platform with SQL database semantics and an integrated management interface to enable large-scale data analytics.
Databricks
Azure service that provides Azure-hosted clusters for popular Apache open-source big data processing technologies, including:
Apache Spark
Apache Hadoop
Apache Hive
Apache Kafka
Azure HDInsight
This analytics PaaS provides a single service interface for multiple analytical capabilities, including
Pipelines
SQL
Apache Spark
Azure Synapse Data Explorer
Azure Synapse Analytics
A real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.
Azure Stream Analytics
A standalone service that offers the same high-performance querying of log and telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.
Azure Data Explorer
You have data stored in two tables in a database.
You create a relationship between the tables.
Which type of data do you have?
structured
This data format has some structure, but which allows for some variation between entity instances.
One common format for semi-structured data is JavaScript Object Notation (JSON).
Semi-structured data
This data format adheres to a fixed schema, so all of the data has the same fields or properties. The data is represented in one or more tables that consist of rows to represent each instance of a data entity, and columns to represent attributes of the entity.
Structured data
This data format is information that has some structure, but which allows for some variation between entity instances.
Semi-structured data
You have a folder that contains documents, images, and audio files.
Which type of data do you have?
unstructured
Which type of data should be sent from video cameras in a native binary format?
unstructured
Which type of database should you use to store sequential data in the fastest way possible?
Time series database
What database is used to store hierarchical data, such as organizational charts that have nodes and edges
Graph
What azure service database is the best option for create, read, update, and delete (CRUD) operations, uses the least amount of storage space.
Azure SQL Database
Which Azure Cosmos DB API allows you to implement a non-relational database and model nodes that have relationships between them?
Apache Gremlin
This data is often stored in plain text format with specific field delimiters and row terminators.
Delimited text files
is a ubiquitous format in which a hierarchical document schema is used to define data entities (objects) that have multiple attributes. Each attribute might be an object (or a collection of objects);
JSON
This file storage format format that’s good for both structured and semi-structured data.
JSON
Type of workload
highly denormalized
optimized for read operations are two attributes are characteristics of ?
analytical data workload?
ACID
Atomicity
Consistency
Isolation
Durability
This guarantees that each transaction is treated as a single “unit”, which either succeeds completely or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged.
Atomicity
This ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
Consistency
These are often executed concurrently (e.g., multiple transactions reading and writing to a table at the same time). and ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
Isolation
This guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash).
Durability
Normalization, Schema, Consistency, Tabularare traits of what type of data store
traits of transactional data
Which type of data workload is optimized for updates and relies on relationships between entities to correlate data?
transactional
These workloads are optimized for create, read, update, and delete (CRUD) operations and are highly denormalized
Transactional workloads
These workloads store hierarchical data.
Graph
type of workload
Read operations are optimized.
They calculate business metrics over time.
They operate on historical data.
analytical data workloads
These data workloads operate on historical data, are optimized for read operations, and calculate business metrics over time.
Analytical data workloads
Which feature of transactional data processing guarantees that concurrent processes cannot see the data in an inconsistent state?
isolation
This in transactional data processing ensures that concurrent transactions cannot interfere with one another and must result in a consistent database state.
Isolation
Which job role is responsible for designing database solutions, creating databases, and developing stored procedures?
database engineer
Which job role is responsible for troubleshooting index performance, provisioning access to databases, and backing up databases?
database administrator