Instructor's Method - 6/14/2021 Flashcards
Relational Storage
Options are
- SQL (Azure SQL, Managed Instance)
- MPP (Dedicated SQL Pool)
Platform as a Service
I need a Database. I don’t have to manage underlying infrastructure.
Azure SQL
Platform-as-Service
For SQL Service, MySQL, PostgreSQL
There are differences between Azure SQL and on-prem SQL Server
If starting a new project, recommends to use Azure SQL. Cheaper, much more scalable
Database on VM
IaaS
SQL Server, Oracle
Azure SQL Managed Instance
Platform-as-a-service
Dedicated infrastructure in Azure datacenter
~100% compatible with on-prem SQL Server
If you want lift-and-shift apps in the could that are using SQL server
Symmetric Multi Processing Architecture
DB2, Oracle, SQL Server
MPP architecture
Massive Parallel Processing Architecture
Dedicated SQL Pool (earlier it was known as Azure SQL Data Warehouse)
Non-relational Storage
Azure Cosmos DB
Azure Storage
Azure Data Lake Store
Azure Storage
Object storage
Cheaper service
Not compatible with Hadoop workloads
Lot of features
Azure Data Lake Gen1
Object storage webHDFS compatible (compatible with Hadoop workloads)
Very less features, faster
Azure Data Lake Gen2
Combination of Data Lake Gen1 + Storage
Faster, Cheaper, webHDFS compatible
Batch Processing
Azure Databricks
Azure HDInsight
Azure Data Link Analytics
Azure Synapse Analytics
HDInsight
Earlier, it was Hortonworks distribution of Hadoop
MS took HDP and put it on Azure (you get Spark, Hadoop, Storm…)
HDP on Azure is called HDInsight
4 options to use Apache Spark
- Download Open source Spark
- HDInsight
- Azure Databricks
- Azure Synapse => also has Spark
Stream Processing
Azure Stream Analytics
Axure Databricks
Azure HDInsight