Azure Data Fundamentals Flashcards
Which type of data structure allows you to store data in a two-column format without requiring a complex database management system?
key/value store
What is the best DBMS for create, read, update, and delete (CRUD) operations and uses the least amount of storage space
relational database
What DBMS uses unstructured data such as JSON, and optimized for retrieval
document database
What type of DBMS is used to store hierarchical data, such as organizational charts that have nodes and edges.
graph database
What type of databases are commonly used to store and query structured data.
Relational databases
An entity is a
table
A table should have what kind of keys
Primary and foreign
This reduces storage space a data duplication in relational databases
Normalization
These databases in which each record consists of a unique key and an associated value, which can be in any format.
Key-value databases
These databases, which are a specific form of key-value database in which the value is a JSON document (which the system is optimized to parse and query)
Document databases,
These databases, which store tabular data comprising rows and columns, but you can divide the columns into groups known as column-families. Each column family holds a set of columns that are logically related together.
Column family databases
These databases store entities as nodes with links to define relationships between them.
Graph databases
These databases organize data as a series of two-dimensional tables with rows and columns.
Relational databases
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
RDMS
What Azure Services support relational dbs
Azure SQL Database
Azure Database for MySQL
Azure Database for PostgreSQL
Azure Database for MariaDB
Records are frequently created and updated.
Multiple operations have to be completed in a single transaction.
Relationships are enforced using database constraints.
Indexes are used to optimize query performance.
RDMS
What data store model requires Data to be highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where Data is highly normalized.
Database schemas are required and enforced.
Many-to-many relationships between data entities in the database.
RDMS
Where Constraints are defined in the schema and imposed on any data in the database.
Data requires high integrity. Indexes and relationships need to be maintained accurately.
Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where Data requires strong consistency. Transactions operate in a way that ensures all data are 100% consistent for all users and processes.
Size of individual data entries is small to medium-sized
RDMS
Where each data value with a unique key. Most key/value stores only support simple query, insert, and delete operations. .
key/value store
What data store model Where modify a value (either partially or completely), an application must overwrite the existing data for the entire value. In most implementations, reading or writing a single value is an atomic operation
key/value store
Where An application can store arbitrary data as a set of values. Any schema information must be provided by the application. The key/value store simply retrieves or stores the value by key.
key/value store
What service support key/value pairs
Azure Cosmos DB for Table and Azure Cosmos DB for NoSQL
Azure Cache for Redis
Azure Table Storage
What data store model should you select where the workload Data is accessed using a single key, like a dictionary.
No joins, lock, or unions are required.
No aggregation mechanisms are used.
Secondary indexes are generally not used.
Key value pair
What data store model would be suited for Data caching, Session management, User preference and profile management, Product recommendation and ad serving
Key value pair
What data store model would be suited for workloads where insert and update operations are common.
No object-relational impedance mismatch. Documents can better match the object structures used in application code.
Individual documents are retrieved and written as a single block.
Data requires index on multiple fields.
Document databases
What data store model would be suited for workloads where datatypes can be managed in de-normalized way.
Size of individual document data is relatively small.
Each document type can use its own schema.
Documents can include optional fields.
Document data is semi-structured, meaning that data types of each field are not strictly defined.
Document databases
What data store model would be suited for Product catalog
Content management
Inventory management
Document databases
What Azure services support Document databases
Azure Cosmos DB for NoSQL
What data store model stores a collection of documents, where each document consists of named fields and data. Documents are retrieved by unique keys.
Document databases
This data store model stores two types of information, nodes and edges.
Graph databases
What services support Graph databases?
Azure Cosmos DB for Apache Gremlin
SQL Server
What data store model would be suited for workloads where Complex relationships between data items involving many hops between related data items.
The relationship between data items are dynamic and change over time.
Relationships between objects are first-class citizens, without requiring foreign-keys and joins to traverse.
Graph databases
What data store model stores Nodes and relationships.
Graph databases
These are similar to table rows or JSON documents.
in graph databases
Nodes
These are just as important as nodes, and are exposed directly in the query language.
Relationships
These specify relationships between nodes in graph databases
Edges
This data store model is ideal for Organization charts, Social graphs, Fraud detection, Recommendation engines
Graph databases
These provide massively parallel solutions for ingesting, storing, and analyzing data. The data is distributed across multiple servers to maximize scalability.
Data analytics stores
These data stores use large data file formats such as delimiter files (CSV), parquet, and ORC
Data Analytics
What azure services use data analytics data stores models
Azure Synapse Analytics
Azure Data Lake
Azure Data Explorer
Azure Analysis Services
HDInsight
Azure Databricks
What data store model would be suited for workloads where Data analytics
Enterprise BI
Data analytics
Which type of database can be used for semi-structured data that will be processed by an Apache Spark pool in Azure Synapse Analytics?
Column-family databases
This data base store model organizes data into rows and columns and uses a denormalized approach to structuring sparse data.
Column-family databases
What azure services provide Column-family databases
Azure Cosmos DB for Apache Cassandra
HBase in HDInsight
What data store model perform write operations extremely quickly,
Update and delete operations are rare and is
designed to provide high throughput and low-latency access.
Supports easy query access to a particular set of fields within a much larger record.
Massively scalable.
column-family
Is an example of a non relational data store
What data store model uses tables consisting of a key column and one or more column families.
column-family
What data store model is ideal for Recommendations
Personalization
Sensor data
Telemetry
Messaging
Social media analytics
Web analytics
Activity monitoring
Weather and other time-series data
column-family
What data store model is a set of values organized by time and typically collect large amounts of data in real time from a large number of sources.
Time series databases
What data store model where Updates are rare, and deletes are often done as bulk operations. The records written are generally small, there are often a large number of records, and total data size can grow rapidly.
Time series databases
What azure services provide a time series database
Azure Time Series Insights
What data store model is ideal for Monitoring and event telemetry.
Sensor or other IoT data.
Time series databases
What azure services are commonly used for data object storage
Blob Storage
Azure Data Lake Storage Gen2
This database does not use the tabular schema of rows and columns found in most traditional database systems.
non-relational database
This type of database is optimized for the specific requirements of the type of data being stored. For example, data may be stored as simple key/value pairs, as JSON documents, or as a graph consisting of edges and vertices.
non-relational database
does not use the tabular schema of rows and columns
Document data stores, Columnar data stores, Key/value data stores,
Graph data stores, Time series data stores, Object data stores are all examples of what?
Non-relational data and NoSQL
Which Azure data service allows you to store document, graph, and column-family databases?
Azure Cosmos DB
True or false: Azure SQL Database cannot handle column-family databases
True
This is a fully managed platform-as-a-service (PaaS) database hosted in Azure
Azure SQL Database
This is a hosted instance of SQL Server with automated maintenance, which allows more flexible configuration than Azure SQL DB but with more administrative responsibility for the owner.
Azure SQL Managed Instance
This is a virtual machine with an installation of SQL Server, allowing maximum configurability with full management responsibility.
Azure SQL VM
This role typically provision and manage Azure SQL database systems to support line of business (LOB) applications that need to store transactional data.
Database administrators
This role may use Azure SQL database systems as sources for data pipelines that perform extract, transform, and load (ETL) operations to ingest the transactional data into an analytical system.
Data engineers
This role may query Azure SQL databases directly to create reports, though in large organizations the data is generally combined with data from other sources in an analytical data store to support enterprise analytics.
Data analysts
What open source relational databases does azure provide
MySQL
MariaDB
PostgreSQL
A simple-to-use open-source database management system that is commonly used in Linux, Apache, MySQL, and PHP (LAMP) stack apps.
Azure Database for MySQL
Open source that offers compatibility with Oracle
Azure Database for MariaDB
A hybrid relational-object database. You can store data in relational tables, and custom data types, with their own non-relational properties.
Azure Database for PostgreSQL
A global-scale non-relational (NoSQL) database system that supports multiple application programming interfaces (APIs), enabling you to store and manage data as JSON documents, key-value pairs, column-families, and graphs.
Azure Cosmos DB
WhatAzure service provides key/attribute storage for applications that need to read and write data values quickly at a low cost
Azure tables
You can store any number of entities in a table,
network file shares such as you typically find in corporate networks
AzureFiles
scalable, cost-effective storage for binary files.
Blob containers
This azure service enables you to define and schedule data pipelines to transfer and transform data. You can integrate your pipelines with other Azure services, enabling you to ingest data from cloud data stores, process the data using cloud-based compute, and persist the results in another data store.
Azure Data Factory
ThisAzure service is a comprehensive, unified Platform-as-a-Service (PaaS) solution for data analytics that provides a single service interface for multiple analytical capabilities,
Azure Synapse Analytics
Which combines the Apache Spark data processing platform with SQL database semantics and an integrated management interface to enable large-scale data analytics.
Databricks
Azure service that provides Azure-hosted clusters for popular Apache open-source big data processing technologies, including:
Apache Spark
Apache Hadoop
Apache Hive
Apache Kafka
Azure HDInsight
This analytics PaaS provides a single service interface for multiple analytical capabilities, including
Pipelines
SQL
Apache Spark
Azure Synapse Data Explorer
Azure Synapse Analytics
A real-time stream processing engine that captures a stream of data from an input, applies a query to extract and manipulate data from the input stream, and writes the results to an output for analysis or further processing.
Azure Stream Analytics
A standalone service that offers the same high-performance querying of log and telemetry data as the Azure Synapse Data Explorer runtime in Azure Synapse Analytics.
Azure Data Explorer
You have data stored in two tables in a database.
You create a relationship between the tables.
Which type of data do you have?
structured
This data format has some structure, but which allows for some variation between entity instances.
One common format for semi-structured data is JavaScript Object Notation (JSON).
Semi-structured data
This data format adheres to a fixed schema, so all of the data has the same fields or properties. The data is represented in one or more tables that consist of rows to represent each instance of a data entity, and columns to represent attributes of the entity.
Structured data
This data format is information that has some structure, but which allows for some variation between entity instances.
Semi-structured data
You have a folder that contains documents, images, and audio files.
Which type of data do you have?
unstructured
Which type of data should be sent from video cameras in a native binary format?
unstructured
Which type of database should you use to store sequential data in the fastest way possible?
Time series database
What database is used to store hierarchical data, such as organizational charts that have nodes and edges
Graph
What azure service database is the best option for create, read, update, and delete (CRUD) operations, uses the least amount of storage space.
Azure SQL Database
Which Azure Cosmos DB API allows you to implement a non-relational database and model nodes that have relationships between them?
Apache Gremlin
This data is often stored in plain text format with specific field delimiters and row terminators.
Delimited text files
is a ubiquitous format in which a hierarchical document schema is used to define data entities (objects) that have multiple attributes. Each attribute might be an object (or a collection of objects);
JSON
This file storage format format that’s good for both structured and semi-structured data.
JSON
Type of workload
highly denormalized
optimized for read operations are two attributes are characteristics of ?
analytical data workload?
ACID
Atomicity
Consistency
Isolation
Durability
This guarantees that each transaction is treated as a single “unit”, which either succeeds completely or fails completely: if any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged.
Atomicity
This ensures that a transaction can only bring the database from one consistent state to another, preserving database invariants: any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof.
Consistency
These are often executed concurrently (e.g., multiple transactions reading and writing to a table at the same time). and ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.
Isolation
This guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure (e.g., power outage or crash).
Durability
Normalization, Schema, Consistency, Tabularare traits of what type of data store
traits of transactional data
Which type of data workload is optimized for updates and relies on relationships between entities to correlate data?
transactional
These workloads are optimized for create, read, update, and delete (CRUD) operations and are highly denormalized
Transactional workloads
These workloads store hierarchical data.
Graph
type of workload
Read operations are optimized.
They calculate business metrics over time.
They operate on historical data.
analytical data workloads
These data workloads operate on historical data, are optimized for read operations, and calculate business metrics over time.
Analytical data workloads
Which feature of transactional data processing guarantees that concurrent processes cannot see the data in an inconsistent state?
isolation
This in transactional data processing ensures that concurrent transactions cannot interfere with one another and must result in a consistent database state.
Isolation
Which job role is responsible for designing database solutions, creating databases, and developing stored procedures?
database engineer
Which job role is responsible for troubleshooting index performance, provisioning access to databases, and backing up databases?
database administrator
Which job role is responsible for building data models and finding hidden data patterns?
data analyst
Which two keys are needed to create a one-to-many relationship between two tables in a relational database? Each correct answer presents part of the solution.
a foreign key, a primary key
Which SQL operation is used to combine the content of two tables based on a shared column?
JOIN
This is used to combine data from two tables based on a shared key
JOIN
Select the answer that correctly completes the sentence.
[Answer choice] is a process to reduce duplicate data in a database and ensure data integrity.
Normalization
This is a process to reduce data duplication. This can be done by separating entities into their own tables and establishing relationships between the tables.
Normalization
Nmae 2
Which two DML statements are used to modify the existing data in a table? Each correct answer presents a complete solution.
MERGE
UPDATE
In a relational database, what can you use to create a virtual table from the results of a SELECT statement?
a view
This defines SQL statements that can be run on command.
A stored procedure
Which service is managed and serverless, avoids the use of Windows Server licenses, and allows for each workload to have its own instance of the service being used?
Azure SQL Database
This is a PaaS service, but databases are maintained in the same SQL Managed Instance cluster
SQL Managed Instance
Which data service provides a fully managed relational database with close to 100 percent feature parity with Microsoft SQL Server?
Azure SQL Managed Instance
Which three open-source databases are available as platform as a service (PaaS) in Azure? Each correct answer presents a complete solution.
MariaDB
MySQL
PostgreSQL
Name 2
What open source DB can you use on a virtual machine.
CockroachDB and CouchDB
Which data service allows you to create a single database that can scale up and down without downtime?
SQL Azure SQL Database
Which data service allows you to migrate an entire Microsoft SQL Server to the cloud without requiring that you manage the infrastructure after the migration?
Azure SQL Managed Instance
Uptime of Azure SQL Database
99.995%
Uptime of Azure SQL Managed Instance
99.99%
Uptime of SQL Server on Azure VMs
99.99%
This approach is suitable for migrations and applications requiring access to operating system features that might be unsupported at the PaaS level.
SQL Server on Azure Virtual Machines
This approach is suitable for if you want to lift-and-shift an on-premises SQL Server instance and all its databases to the cloud, without incurring the management overhead of running SQL Server on a virtual machine
Azure SQL Database Managed Instance
This PaaS offering allows the creation of a managed database server in the cloud, and then deploy your databases on this server. Available as a Single Database or an Elastic Pool.
Azure SQL Database
Which data service allows you to migrate an entire Microsoft SQL Server to the cloud without requiring that you manage the infrastructure after the migration?
Azure SQL Managed Instance
Which open-source database is a hybrid relational-object database?
PostgreSQL
You can store data in relational tables, or you can store custom data types with non-relational properties.
PostgreSQL
Which open-source database has built-in support for temporal data?
MariaDB
Open source relational database offering database management system for web applications, running under Linux
MySQL
Open source relational database offering offers compatibility with Oracle and support for temporal data
MariaDB
Which type of Azure Storage is the least expensive option that allows you to store new or modified image files?
block blobs in the Archive tier
Open source relational database offering hybrid relational-object database.
PostgreSQL
Open source relational database offering enables you to store custom data types, with their own non-relational properties.
PostgreSQL
Open source relational database offering the ability to store and manipulate geometric data, such as lines, circles, and polygons.
PostgreSQL
This blob are best used to store discrete, large, binary objects that change infrequently.
block blob
This blob is used to implement virtual disk storage for virtual machines.
page blobs
This blobs can only add blocks to the end of an append blob; updating or deleting existing blocks isn’t supported.
Append blobs
Which type of Azure Storage is used to store large amounts of files to be shared with virtual machines by using SMB?
Azure Files
Which two storage solutions can be mounted in Azure Synapse Analytics and used to process large volumes of data? Each correct answer presents a complete solution.
Azure Blob storage
Azure Data Lake Storage
Which type of blob should you use to store data blocks that are added to a file frequently but cannot be deleted?
append
This blob allows you to frequently add new data to a file, but it does not allow for the modification or deletion of existing data.
append blob
You need to replace an existing on-premises SMB shared folder with a cloud solution.
Which storage option should you choose?
Azure Files
Which Azure Cosmos DB API should you use for data in a column-family storage structure?
Apache Cassandra
API stores data in the Binary JSON (BSON) format
MongoDB
Which Azure Cosmos DB API should you use for data in key/value tables?
Table
You need to process many JSON files every minute, while keeping the data from the files accessible by using native queries.
Which Azure Cosmos DB API should you use?
NoSQL
Which Azure Cosmos DB API is queried by using a syntax based on SQL?
Apache Cassandra
Which Azure Cosmos DB API is queried by using MQL?
The MongoDB API
Which Azure Cosmos DB API is queried by using Graph
Gremlin API
Which Azure Cosmos DB API is queried by using OData and LINQ
The Table API
Which storage solution allows you to aggregate data stored in JSON files for use in analytical reports without additional development effort?
Azure Cosmos DB
What should you use to create and run data ingestion pipelines?
Azure Data Factory
Which type of data store uses star schemas, fact tables, and dimension tables?
data warehouse
Which two services allow you to create a pipeline to process data in response to an event? Each correct answer presents a complete solution.
Azure Data Factory
Azure Synapse Analytics
Which two services allow you to pre-process a large volume of data by using Scala? Each correct answer presents a complete solution.
a serverless Apache Spark pool in Azure Synapse Analytics
Azure Databricks
Latency is measured in seconds or milliseconds. is a characteristic of ?
stream processing
processing is fast. It processes data as it arrives.
executes simple analytics or simply writes data to a sink. handles small chunks of data are characteristics of what
Stream processing
Complex analysis can be performed. is a characteristic of ?
batch processing
execute complex analysis. handles a large amount of data at a time. is usually measured in minutes and hours are characteristics of what
Batch processing
Which two services can be used as a source for stream processing? Each correct answer presents a complete solution.
Azure Event Hubs
Azure IoT Hub
In a stream processing architecture, what can you use to persist the processed results as files?
Azure Data Lake Storage Gen2
blank is a data ingestion service
Event Hubs
Which service allows you to perform on-demand analysis of large volumes of data from text logs, websites and IoT devices by using a common querying language for all the data sources?
Azure Data Explorer
Blank is used for the analysis of large amounts of text log data, websites, and IoT devices and uses a common querying language.
Data Explorer
Blank is used to define streaming jobs, apply a perpetual query, and write the results to an output.
Azure Stream Analytics
Which type of visual in Microsoft Power BI should you use to identify the correlation between two numeric measures?
scatter plots
Which type of visual in Microsoft Power BI should you use to compare categorized values as the proportions of a total value?
pie chart
Which visual in Microsoft Power BI allows you to view trends, such as changes in sales over time?
a line chart
In the Power BI service, what should you create to share a single page with the most important visuals from reports?
a dashboard
You need to share a report that you created in Microsoft Power BI Desktop with other users.
What should you do first?
Publish the report to the Power BI service.
Blank charts are a good way to visually compare numeric values for discrete categories.
Bar and column
Blank can also be used to compare categorized values and are useful when you need to examine trends, often over time.
Line charts
Blank are often used in business reports to visually compare categorized values as proportions of a total.
Pie charts
Blank are useful when you want to compare two numeric measures and identify a relationship or correlation between them.
Scatter plots
this sql cmd is used to filter content from a GROUP BY command
HAVING
this sql cmd displays the content of two sets of columns from two tables but is not based on a shared key
UNION
this sql cmd shows only values that exist in both tables.
INTERSECT
Use this to encapsulate any type of business logic that can be reused in the application and modify existing data as well as add new entries to tables.
A stored procedure
A common format for semi-structured data
JSON
blank data has some structure but allows for variations between entity instances.
Semi-structured
Which two types of file store data in columnar format? Each correct answer presents a complete solution.
Parquet
ORC
Which Azure Cosmos DB API should you use for data in the BSON format?
Mongo
Which service allows you to aggregate data over a specific time window before the data is written to a data lake?
Azure Stream Analytics
Which two visuals in Microsoft Power BI allow you to visually compare numeric values for discrete categories? Each correct answer presents a complete solution.
a bar chart
a column chart
You have an Azure Cosmos DB service running the SQL API. One of the operational databases has a lot of transactions.
Which service allows you to perform near real-time analytics on the operational data stored in Azure Cosmos DB?
Azure Synapse
Which type of visual in Microsoft Power BI should you use to display a single value, such total sales?
Card
This data format adheres to a fixed schema, data is represented in one or more tables that consist of rows to represent each instance of a data entity, and columns to represent attributes of the entity
Structured data
This data format allows for some variation between entity instances.
One common format for semi-structured data is JavaScript Object Notation (JSON).
Semi-structured data
This data format may be comprised of documents, images, audio and video data, and binary files might
unstructured data.
What are the two types of datastores
File stores
Databases
Common file formats for data storage?
Delimited text files, JSON, XML, BLOB
Some common optimized file formats
Avro, ORC, and Parquet
This is a row-based optimized file format
Avro
These are 2 column-based optimized file formats
ORC, Parquet
Types of databases
relational, Non-relational databases
Four common types of Non-relational database commonly in use
Key-value databases, Document databases, Column family databases, Graph databases
This data workload relies on a database system in which data storage is optimized for both read and write operations,
transactional data processing (OLTP)
This data workload typically uses read-only (or read-mostly) systems that store vast volumes of historical data or business metrics.
Analytical data processing (OLAP)
These are three examples of data analytic workloads
Data lakes, Data warehouses, lakehouse
This data workload model is an aggregated type of data storage Data aggregations are across dimensions at different levels, enabling you to drill up/down to view aggregations at multiple hierarchical levels;
OLAP model
This data workload model stores data in a relational schema that is optimized for read operations
Data warehouses
This data workload model combines the flexible and scalable storage of a data lake with the relational querying semantics of a data warehouse.
Data Lakehouses
This data workload model are common in large-scale data analytical processing scenarios, where a large volume of file-based data must be collected and analyzed
Datalake
This type of database don’t apply a relational schema to the data.
non-relational
How is data in a relational table organized?
Rows and Columns
Which of the following is an example of unstructured data?
Audio and Video files
What is a data warehouse?
A relational database optimized for read operations
A SQL engine that is optimized for Internet-of-things (IoT) scenarios that need to work with streaming time-series data.
Azure SQL Edge
Three opensource relational database management systems that are tailored for different specializations.
MySQL, MariaDB, and PostgreSQL
A fully managed, highly scalable PaaS database service. A good option when you need to create a new application in the cloud.
Azure SQL Database
A platform-as-a-service (PaaS) option that provides near-100% compatibility with on-premises SQL Server instances while abstracting the underlying hardware and operating system. The service includes automated software update management, backups, and other maintenance tasks, reducing the administrative burden of supporting a database server instance.
Azure SQL Managed Instance
an infrastructure-as-a-service (IaaS) solution that virtualizes hardware infrastructure for compute, storage, and networking in Azure; making it a great option for “lift and shift” migration of existing on-premises SQL Server installations to the cloud.
SQL Server on Azure Virtual Machines (VMs)
The best option for low cost with minimal administration. It isn’t fully compatible with on-premises SQL Server installations. It’s often used in new cloud projects where the application design can accommodate any required changes to your applications.
Azure SQL Database
Which deployment option offers the best compatibility when migrating an existing SQL Server on-premises solution?
Azure SQL Managed Instance
Azure SQL Managed Instance offers near 100% compatibility with SQL Servr
Which database service is the simplest option for migrating a LAMP application to Azure?
Azure Database for MySQL
Correct. LAMP standard for Linux, Apache, MySQL, and PHP.
Storage options for non-relational data
Blob Storage, Azure DataLake Storage Gen2,Azure Files, Azure Tables
What is a Azure Table Storage is a NoSQL storage solution that makes use of tables containing key/value data items, semi-structured data and is comprised of a partition and a row key
Azure Table Storage
Storage solution that uses SMB and NFS protocols
Azure Files
hierarchical data storage for analytical data lakes and can be mounted by Hadoop in Azure HDInsight, Azure Databricks, and Azure Synapse Analytics
Azure DataLake Storage Gen2
What are the elements of an Azure Table storage key?
Partition key and row key
What should you do to an existing Azure Storage account in order to support a data lake for Azure Synapse Analytics?
Upgrade the account to enable hierarchical namespace and create a blob container
Why might you use Azure File storage?
To enable users at different sites to share files.
Microsoft’s native non-relational service for working with the document data model. It manages data in JSON document format, and despite being a NoSQL data storage solution, uses SQL syntax to work with the data.
Azure Cosmos DB for NoSQL
is a popular open source database in which data is stored in Binary JSON (BSON) format
Azure Cosmos DB for MongoDB
This is a globally distributed relational database that automatically shards data to help you build highly scalable apps. seamlessly scale to multiple nodes by transparently distributing your tables.
Azure Cosmos DB for PostgreSQL
To create an Azure Data Lake Store Gen2 files system, you must enable the?
Hierarchical Namespace
In this azure service All rows in a table must have a unique key (composed of a partition key and a row key)
Azure tables
What are the elements of an Azure Table storage key?
Partition key and row key
What should you do to an existing Azure Storage account in order to support a data lake for Azure Synapse Analytics?
Upgrade the account to enable hierarchical namespace and create a blob container
Why might you use Azure File storage?
To enable users at different sites to share files.
Cosmos DB use cases
IoT and telematics
Retail and marketing
Gaming
Web and mobile applications.
Which API should you use to store and query JSON documents in Azure Cosmos DB?
Which Azure Cosmos DB API should you use to work with data in which entities and their relationships to one another are represented in a graph using vertices and edges?
Azure Cosmos DB for Apache Gremlin
How can you enable globally distributed users to work with their own local replica of a Cosmos DB database?
Enable multi-region writes and add the regions where you have users.
You can create pipelines with these azure services
Azure Data Factory, Azure Synapse Analytics,Microsoft Fabric
There are two common types of analytical data store
Data warehouses,Data lakehouses
On Azure, there are three main platform-as-a-service (PaaS) services that you can use to implement a large-scale analytical store
Azure Synapse Analytics
is a great choice when you want to create a single, unified analytics solution on Azure.
Synapse Analytics
as your analytical store if you want to use existing expertise with the platform or if you need to operate in a multicloud environment or support a cloud-portable solutionAzure Databricks is an Azu
Azure Databricks
suitable option if your analytics solution relies on multiple open-source frameworks or if you need to migrate an existing on-premises Hadoop-based solution to the cloud.
HDI INsight
is a unified software-as-a-service (SaaS) offering, with all your data stored in a single open format in OneLake.
Fabric
Which Azure PaaS services can you use to create a pipeline for data ingestion and processing?
Azure Synapse Analytics and Azure Data Factory
What must you define to implement a pipeline that reads data from Azure Blob Storage?
A linked service for your Azure Blob Storage account
Which open-source distributed processing engine does Azure Synapse Analytics include?
Apache Spark
Large volumes of data can be processed at a convenient time.
It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours are advantages of?
Batch processing
The time delay between ingesting the data and getting the results.
All of a batch job’s input data must be ready before a batch can be processed. This means data must be carefully checked. Problems with data, errors, and program crashes that occur during batch jobs bring the whole process to a halt. The input data must be carefully checked before the job can be run again. Even minor data errors can prevent a batch job from running. are advantages of?
Batch processing
is ideal for time-critical operations that require an instant real-time response. For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.
Stream processing
,Batch processing can process all the data in the dataset. Batch processing is suitable for handling large datasets efficiently, typically a few hours. You typically use batch processing to perform complex analytics.
.
typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example), is intended for individual records or micro batches consisting of few records, typically occurs immediately, with latency in the order of seconds or milliseconds, is used for simple response functions, aggregates, or calculations such as rolling averages
Stream processing
Azure Event Hubs:,Azure IoT Hub,Azure Data Lake Store Gen 2,Apache Kafka are all sources for
Steam processing
Azure Event Hubs:
Azure Data Lake Store Gen 2 Azure blob storage:
Azure SQL Database or Azure Synapse Analytics, or Azure Databricks:
Microsoft Power BI:
Sinks for Streams
Used to queue the processed data for further downstream processing.
Azure Event Hubs
Used to persist the processed results as a file
Azure Data Lake Store Gen 2 or Azure blob storage:
Used to persist the processed results in a database table for querying and analysis.
Azure SQL Database or Azure Synapse Analytics, or Azure Databricks:
Used to generate real time data visualizations in reports and dashboards.
Microsoft Power BI:
This distributed processing framework for large scale data analytics. You can use in the following services:
Azure Synapse Analytics
Azure Databricks
Azure HDInsight
Apache Spark
This is an open-source storage layer that adds support for transactional consistency, schema enforcement, and other common data warehousing features to data lake storage.
Delta Lake
This represent the entities by which you want to aggregate numeric measures
Dimension tables
are a good way to visually compare numeric values for discrete categories.
Bar and column charts
can also be used to compare categorized values and are useful when you need to examine trends, often over time.
Line charts
are often used in business reports to visually compare categorized values as proportions of a total
Pie charts
are useful when you want to compare two numeric measures and identify a relationship or correlation between them.
Scatter plots
. Which kind of visualization should you use to analyze pass rates for multiple exams over time?
A line chart
What should you define in your data model to enable drill-up/down analysis?
A hierarchy
Which tool should you use to import data from multiple data sources and create a report?
Power BI Desktop