Azure Synapse Analytics Modern Warehousing Concepts Flashcards
Define Data Warehousing with Microsoft Azure Synapse Analytics
Azure Synapse Analytics, formerly known as Azure SQL Data Warehouse, is an analytics engine. It’s designed to process large amounts of data very quickly using a SQL pool and MPP engine.. Azure Synapse Analytics blends big data analytics with data warehousing.
How do we get to or consume the Insights using Azure Synapse Analytics?
Using Analytics Dashbords, Operational Reporting or Advanced Analytics
What are the 4 main stages in an Azure Synapse Pipeline flow?
- Ingest and Prepare
- Store
- Model and Serve
- Visualize
Can we move SSIS ETL or transformation flows into Azure Data Factory
Yes we can easily
Four storage types for SSIS packages
- SSIS catalog (SSISDB). Introduced with SQL Server 2012 and contains a set of stored procedures, views, and table-valued functions used for working with SSIS projects/packages.
- File System.
- SQL Server system database (MSDB).
- SSIS Package Store. A package management layer on top of two subtypes:
- MSDB, which is a system database in SQL Server used to store SSIS packages.
- Managed file system, which is a specific folder in SQL Server installation path used to store SSIS packages.
DMA currently supports the batch-assessment of packages stored in File System, Package Store, and SSIS catalog since DMA version v5.0.
Get DMA, and perform your package assessment with it.
Migration
Depending on the storage types of source SSIS packages, the steps to migrate SSIS packages and SQL Server Agent jobs that schedule SSIS package executions may vary.
It is also a practical way to use SSIS DevOps Tools, to do batch package redeployment to the migration destination.
Package storage typeHow to migrate SSIS packagesHow to migrate SSIS jobsSSISDBRedeploy packages via SSDT/SSMS to SSISDB hosted in Azure Managed Instance. For more info, see Deploying SSIS packages in Azure.
* Migrate from SQL Server Agent on premises to SQL Managed Instance agent via scripts/manual copy. For more info, see run SSIS packages via Azure SQL Managed Instance Agent
* Convert them into ADF pipelines/activities/triggers via scripts/SSMS/ADF portal. For more info, see SSMS scheduling feature.
File SystemRedeploy them to file shares/Azure Files via dtinstall/dtutil/manual copy, or to keep in file systems to access via VNet/Self-Hosted IR. For more info, see dtutil utility.
* Migrate from SQL Server Agent on premises to SQL Managed Instance agent via scripts/manual copy. For more info, see run SSIS packages via Azure SQL Managed Instance Agent
* Migrate with SSIS Job Migration Wizard in SSMS
* Convert them into ADF pipelines/activities/triggers via scripts/SSMS/ADF portal. For more info, see SSMS scheduling feature.
SQL Server (MSDB)Export them to file systems/file shares/Azure Files via SSMS/dtutil. For more info, see Exporting SSIS packages.Convert them into ADF pipelines/activities/triggers via scripts/SSMS/ADF portal. For more info, see SSMS scheduling feature.Package StoreExport them to package store via SSMS/dtutil or redeploy them to package store via dtinstall/dtutil/manual copy. For more info, see Manage packages with Azure-SSIS Integration Runtime package store.
* Migrate from SQL Server Agent on premises to SQL Managed Instance agent via scripts/manual copy. For more info, see run SSIS packages via Azure SQL Managed Instance Agent
* Convert them into ADF pipelines/activities/triggers via scripts/SSMS/ADF portal. For more info, see SSMS scheduling feature.
What are the 2 main tools for Azure Ingest and Prepare stage?
Azure Data Factory
Azure Databricks
Can we use Azure Synapse Exclusively (without any other tool)?
Yes, it’s a good choice for Greenfield implementations or use a Hybrid approach for corporations with existing Azure Service
Name 3 languages which we can use in Azure Synapse Analytics
- Python
- SQL
- Scala
If we press the “integrate” button in Synapse Analytics what will we see?
An environment exactly like the one Az Data Factory has, since ADF is included in Synapse Analytics.
The pipeline creation experience is the same as in ADF, which gives you another powerful integration built into Azure Synapse Analytics, removing the need to use Azure Data Factory for data movement and transformation pipelines.
What are the 3 Copy methods in Synapse Analytics Integration tab for Sink?
- PolyBase
- Copy command
- Bulk insert
To perform a scale operation for a SQL Pool…
To perform a scale operation, SQL pool first kills all incoming queries and then rolls back transactions to ensure a consistent state.
Scaling only occurs once the transaction rollback is complete.
What are the 2 types of Analytics Pools available in Synapse?
-
SQL pools
- Dedicated
- Serverless
- Apache Spark pools
How does Synapse scale out or in automatically Apache Spark pools?
You can configure the Apache Spark pool to have a fixed size by disabling the autoscale setting. Here we have enabled autoscale and set the minimum and maximum number of nodes to control the amount of scale applied. When you enable autoscale, Synapse Analytics monitors the resource requirements of the load and scales the number of nodes up or down. It does this by continuously monitoring pending CPU, pending memory, free CPU, free memory, and used memory per node metrics. It checks these metrics every 30 seconds and makes scaling decisions based on the values.
It can take 1-5 minutes for a scaling operation to complete.
What options can we find in Develop tab?
- SQL Scripts
- Notebooks
- Data flows
- Power BI
We can use JSON functions, such as xxx and xxx to extract the JSON data and extract it to specific structured columns inside a Table in a SQL Pool.
JSON_VALUE and ISJSON
What is a data flow in Develop Tab of Synapse?
data flows are powerful data transformation workflows that use the power of Apache Spark, but are authored using a code-free GUI. The work you do in the UI gets transformed into code executed by a managed Spark cluster, automatically, without having to write any code or manage the cluster.
When ingesting raw data in batch from new data sources, which of the following data formats are natively supported by Synapse Analytics?
- Scala
- JSON
- ORC
- CSV
- Parquet
- JSON
- ORC
- CSV
- Parquet
Which of the following would be a valid reason for adding a staging area into the architecture of a modern data warehouse?
- To reduce contention on source systems
- To make data analytics available directly from the staging area
- To join data from different source systems
- Enable the ingestion of source systems based on different schedules
- To reduce contention on source systems
- To join data from different source systems
- Enable the ingestion of source systems based on different schedules