Azure Flashcards
What are the four benefits of Azure Data Lake Storage Gen2?
- Hadoop compatible access (data can be treated as if it’s stored in HDFS, can be saved in one place and accessed by Databricks, HDInsights, and Synapse Analytics)
- Security (supports access control lists (ACLs) and Portable Operating System Interface (POSIX) permissions that don’t inherit the permissions of the parent directory)
- Performance (data processing requires less computational resources, because the data is stored in a hierarchy of directories
- Data redundancy (takes advantage of the Azure Blob replication models that provide data redundancy in a single data center with locally redundant storage (LRS), or to a secondary region by using the Geo-redundant storage (GRS) option)
How do you enable Azure Data Lake Storage Gen2?
Azure Data Lake Storage Gen2 isn’t a standalone Azure service, but rather a configurable capability of a StorageV2 (General Purpose V2) Azure Storage.
To enable Azure Data Lake Storage Gen2 in an Azure Storage account, you can select the option to ‘Enable hierarchical namespace’ in the Advanced page when creating the storage account in the Azure portal
What’s the difference between Azure Blob Storage and Azure Data Lake Storage Gen2?
In Azure Blob storage, you can store large amounts of unstructured (“object”) data in a flat namespace within a blob container. Whereas Azure Data Lake Storage Gen2 builds on blob storage and optimizes I/O of high-volume data by using a hierarchical namespace that organizes blob data into directories, and stores metadata about each directory and the files within it
When should Hierarchical Namespace/Azure Data Lake Storage Gen2 be enabled
When analysis will be performed on the data
What is Azure Synapse Analytics?
It provides a cloud platform for all of analytical workloads (descriptive, diagnostic, predictive, prescriptive) through support for multiple data storage, processing, and analysis technologies in a single, integrated solution
When should Azure Synapse Analytics be used?
- Large scale data warehousing
- Advanced analytics (native features and Azure Machine Learning)
- Data exploration and discovery (using serverless SQL pool functionality)
- Real time analytics (with Azure Synapse Link, Azure Stream Analytics, and Azure Data Explorer)
- Data integration (Azure Synapse Pipelines)
- Integrated analytics (integration of the analytics landscape into one service)
What is KQL?
Kusto Query Language. A query language similar to SQL that is optimized for data that includes a time series component, such as realtime data from log files or IoT devices
What are the two runtime environments provided by Azure Synapse SQL?
- Serverless SQL pool (on-demand SQL query processing, primarily used to work with data in a data lake)
- Dedicated SQL pool (Enterprise-scale relational database instances used to host data warehouses in which data is stored relational tables)
What are common use cases for serverless SQL pools?
- Data exploration
- Data transformation
- Logical data warehouses (to define external objects such as tables and views in a serverless SQL database)
What SQL function is used to query common data file formats (csv, json, parquet)?
OPENROWSET
What is a CETAS statement and where is it used?
CREATE EXTERNAL TABLE AS SELECT. It can be used in a dedicated SQL pool or serverless SQL pool to persist the results of a query in an external table, which stores its data in a file in the data lake
Why is encapsulating a data transformation in stored procedure a good practice?
This approach makes it easier to operationalize data transformations through parameters, retrieval of outputs, and possibility to include additional logic in a single procedure call
What are the benefits of stored procedures?
- Encapsulates of Transact-SQL logic
- Reduces network traffic (procedure is executed as a single batch of code)
- Provides a security boundary (users with no direct permission can still use it)
- Eases maintenance (changes are only applied to stored procedure)
- Improves performance (executing plan is held in cache)
What is a lake database?
It provides a relational metadata layer over one or more files in a data lake. You can create a lake database that includes definitions for tables, including column names and data types as well as relationships between primary and foreign key columns. The tables reference files in the data lake, enabling you to apply relational semantics to working with the data and querying it using SQL. However, the storage of the data files is decoupled from the database schema; enabling more flexibility than a relational database system typically offers