Describe an analytics workload on Azure (25% - 30%) Flashcards
What is HTAP?
Hybrid transactional and analytical processing.
What Power BI tools do you use to make reports?
Power BI Desktop, Power BI service and Power BI Reports Builder.
What Power BI tools do you use to make paginated reports?
Power BI service, Power BI Premium and Power BI Report Builder.
What’s the difference between a data lake and a data warehouse?
A data warehouse holds structured information; a data lake holds raw data.
When would you need a data warehouse?
When you want to run complex queries over large sets of data (Big Data) from various sources.
What five services does Azure offer for modern data warehousing?
Azure Synapse Analytics, Azure Data Factory, Azure Data Lake Storage, Azure Databricks and Azure Analysis Services.
What is Azure Data Factory?
Azure Data Factory (ADF) is a data integration service for retrieving data from various sources and converting it into a format for further processing.
It uses pipelines to ingest, clean, convert, and output data.
What is Azure Data Lake Storage?
An extension of Azure Blob Storage organised as a near-infinite file system.
What is a data lake?
A repository for large quantities of raw data from various sources.
What are the characteristics of Azure Data Lake Storage?
It organises files in directories and sub-directories for improved file organisation.
It supports Portable Operating System Interface (POSIX) file and directory permissions to enable granular RBAC access control on your data.
It is compatible with the Hadoop Distributed File System (HDFS). All Apache Hadoop environments can access data in Azure Data Lake Storage Gen2.
What is Azure Databricks?
An Apache Spark environment running on Azure to provide big data processing, streaming, and machine learning.
What languages can you use with Azure Databricks?
R, Scala and Python.
What are notebooks?
A notebook is a collection of cells, each of which contain a separate block of code.
What is Azure Synapse Analytics?
Azure Synapse Analytics is a analytics engine designed to process large amounts of data quickly.
How does Azure Synapse Analytics leverage a massively parallel processing (MPP) architecture?
With a control node and a pool of compute nodes.