Instructor's Method - 6/14/2021 Flashcards
Relational Storage
Options are
- SQL (Azure SQL, Managed Instance)
- MPP (Dedicated SQL Pool)
Platform as a Service
I need a Database. I don’t have to manage underlying infrastructure.
Azure SQL
Platform-as-Service
For SQL Service, MySQL, PostgreSQL
There are differences between Azure SQL and on-prem SQL Server
If starting a new project, recommends to use Azure SQL. Cheaper, much more scalable
Database on VM
IaaS
SQL Server, Oracle
Azure SQL Managed Instance
Platform-as-a-service
Dedicated infrastructure in Azure datacenter
~100% compatible with on-prem SQL Server
If you want lift-and-shift apps in the could that are using SQL server
Symmetric Multi Processing Architecture
DB2, Oracle, SQL Server
MPP architecture
Massive Parallel Processing Architecture
Dedicated SQL Pool (earlier it was known as Azure SQL Data Warehouse)
Non-relational Storage
Azure Cosmos DB
Azure Storage
Azure Data Lake Store
Azure Storage
Object storage
Cheaper service
Not compatible with Hadoop workloads
Lot of features
Azure Data Lake Gen1
Object storage webHDFS compatible (compatible with Hadoop workloads)
Very less features, faster
Azure Data Lake Gen2
Combination of Data Lake Gen1 + Storage
Faster, Cheaper, webHDFS compatible
Batch Processing
Azure Databricks
Azure HDInsight
Azure Data Link Analytics
Azure Synapse Analytics
HDInsight
Earlier, it was Hortonworks distribution of Hadoop
MS took HDP and put it on Azure (you get Spark, Hadoop, Storm…)
HDP on Azure is called HDInsight
4 options to use Apache Spark
- Download Open source Spark
- HDInsight
- Azure Databricks
- Azure Synapse => also has Spark
Stream Processing
Azure Stream Analytics
Axure Databricks
Azure HDInsight
Orchestration
Azure Data Factory
Modern Data Warehouse
Bring together all your data at scale, and get insights through analytical dashboards, operational reports, or advanced analytics for all the users
4 layers of Modern Data Warehouse
- Ingestion (Extract)
- Storage (Load)
- Data Preparation (Transform)
- Model & Serve (Serve)
What Products can be used to Ingest data
Azure Data Factory
What products can be used to Store data
Azure Storage
Data Lake Gen2
What products can be used to Prepare Data
Azure HDInsight
Azure Databricks
Data Lake Analytics
What products can be used to Model and Serve Data
Dedicated SQL Pool
Azure Analysis Services
What products can be used to Visualize
PowerBI
Azure Synapse Analytics
Does following
- Ingestion (Extract)
- Storage (Load)
- Data Preparation (Transform)
- Model & Serve (Serve)
Azure Synapse Analytics features
Set of multiple integrated Azure Data services
Bring in multiple data sources at one place
Bring all your code at one place
Communication between different compute options
Centralized management, privacy, data, security
Synapse Workspace / Studio
Storage: Data Lake Gen2
Compute: Dedicated SQL Pools, Apache Spark Pools,
Serverless SQL
Ingestion: Synapse Pipelines (Azure Data Factory integrated into Synapse), Mapping Data Flows (ETL, SSIS)
Platform: Monitoring, Management, Security
Connected Services: Azure Cosmos DB, Power BI, Azure ML
Dedicated SQL Pool
SQL-based, fully-managed, petabyte-scale cloud data warehouse