Azure Data Lists Flashcards
Data architectures
- Lambda architecture
- Kappa architecture
Lambda architecture layers
Batch layer
Speed layer
Serving layer
Data warehouse workload types
- Relational
- Non-relational
- Batch
- Streaming
Main phases of a data stream flow
- Production
- Acquisition
- Aggregation and transformation
- Storage
Time window aggregation types
- Tumbling window
- Hopping window
- Sliding window
- Session window
Data stream concepts
- Watermarks
- Consumer groups
- Time window aggregations
Batch processing scenarios
- Data set transformation and preparation
- ETL and ELT workloads
- Machine learning model training
- Applying machine learning models on data sets for scoring
- Report generation
Azure batch Processing Services
- Azure Synapse Analytics
- Azure Data Lake Analytics
- Azure HDInsight
- Azure Databricks
Batch processing tools
- Azure Synapse Analytics
- Azure Data Lake Analytics
- Azure HDInsight
- Azure Databricks
- Apache Hive
- Apache Pig
- Apache Spark
Analytical data stores
- Azure Synapse Analytics
- Spark SQL
- HBase
- Apache Hive
Five V’s of big data
- Volume
- Velocity
- Variety
- Veracity
- Value
Analytics techniques
- Descriptive analysis
- Diagnostic analysis
- Predictive analysis
- Prescriptive analysis
TDSP phases
- Business needs
- Data discovery and acquisition
- Model development
- Model deployment
Common TDSP roles
- Subject matter expert
- Data engineer
- Data scientist
- Application developer
MLOps best practices
- Exploratory data analysis (EDA)
- Data Prep and Feature Engineering
- Model training and tuning
- Model review and governance
- Model inference and serving
- Model deployment and monitoring
- Automated model retraining
Azure Data Factory runtime types
- Azure
- Self-hosted
- SSIS (SQL Server Integration Services)
Azure Data Factory transformation types
- External services
- Mapping data flows (uses Apache Spark code, run on Azure Databricks)
- Wrangling data flows (Power Query editor in Microsoft Power BI)
Azure Data Factory external services for transformations
- Azure SQL Database
- Azure Synapse Analytics
- Azure Databricks
- Azure HDInsight
- Azure Functions
- SQL Server Integration Services (SSIS)
Azure Stream Analytics features
- Provisioned or on-demand SQL Server pools
- Provisioned or on-demand Spark pools
- Stream processing capabalitiies through window aggregations
- ML models aggregation through the PREDICT statement
- Azure DevOps integration
- Data Factory-like pipelines development experience
- Power BI report editor integration
Macro-layers for analytics
- Analytical access
- Reporting access
- Dashboarding access
Azure SQL Database purchasing models
- vCore-based
- DTU-based
Services needed to run SQL Server on an Azure VM
- Azure Storage to contain the virtual disk(s).
- Azure Virtual Network
- Azure Compute Service to run the VM
Extra PostgreSQL data types
- Document
- Geometry
- JSON
- Composite
- Custom
Azure MariaDB and MySQL pricing tiers
- Basic
- General Purpose
- Memory Optimized
Azure Database Migration Service pricing tiers
- Premium for continuous migration
- Standard (free) for offline migration
Azure Database Migration Service service tiers
- General Purpose
- Business Critical
Data security layers for Azure SQL Database (outside-in)
- Network security
- Access management
- Threat protection
- Information protection
- Database
Information protection layer methods
- Physical encryption
- Transparent data encryption (TDE)
- Always encrypted (column-level)
- Dynamic data masking
Threat protection layer methods
- Azure Monitor logs and Event Hubs Audit
- Advanced Threat Protection
Access management layer methods
- Authentication
- Authorization
Network security layer methods
- Firewall
- Virtual networks
SQL Server authentication methods
- SQL Authentication
- Active Directory - Universal with MFA
- Active Directory - Password
- Active Directory - Integrated
Azure SQL Database query tools
- Query Editor
- Sqlcmd utility
- Azure Data Studio
- SQL Server Management Studio
- Visual Studio Code
NoSQL storage types
- Key-value store
- Document store
- Columnar data store
- Graph store
Document types in document databases
- XML
- YAML
- JSON
- BSON
Azure non-relational storage services
- Azure Cosmos DB
- Azure Table Storage
- Azure Blob Storage
- Azure Files
Non-relational storage types
- Key-value store
- Document store
- Columnar data store
- Graph store
- Time series store
- Object data store
- External index data store
Cosmos DB structure from the top down
- CosmosDB account
- Databases
- Containers
- Logical partitions
- Physical partitions
Cosmos DB consistency levels
- Strong
- Bounded staleness
- Session
- Consistent prefix
- Eventual
Azure Storage performance levels
- Standard
- Premium (SSD)
Azure Table Storage requirements
- Must have a partition key, row key, and timestamp
- No more than 255 properties (key/value pairs)
- No more than 1 MB, or 2MB if using the Table API in Cosmos DB.
Types of Azure Blob Storage content
- Page blob
- Block blob
- Append blob
Methods for accessing Azure Blob Storage
- Azure Storage Explorer
- Azure Blob API, aka Blob service REST API.
- Azure PowerShell
- Azure Command-Line Interface (CLI)
- Azure.Storage .NET client library
Azure Files authentication methods
- Active Directory Domain Services (AD DS) for on-premises Active Directory
- Azure Active Directory Domain Services (AD DS)
- Storage account access key (one of the two API keys generated for the account)
RBAC basic levels
- Reader
- Contribitor
- Elevated contributor
Azure non-relational storage security components
- Firewall rules
- Secure transfer using Transport Layer Security (TLS)
- Storage data encryption
Azure non-relational storage data policies
- Time period retention
- Append enabled (part of time period retention)
- Legal hold
Azure non-relational storage authentication methods
- Shared key
- Shared access signature (SAS)
- Azure Active Directory (Azure AD)
- Azure Active Directory Domain Services (Azure AD DS) for file shares
Shared access signature (SAS) configuration options
- Allowed services
- Allowed resource types
- Allowed permissions
- Option to allow deleting versions of objects
- Start and end date/time
- Allowed IP addresses
- Allowed protocols
- Preferred routing tier
Tools for diagnosing connection problems
- Telerik Fiddler
- Microsoft Network Monitor (NetMon)
- Wireshark
Management tools for Azure NoSQL/non-relational data
- Azure Portal
- Azure Data Explorer
- AzCopy
- Cosmos Explorer
- Visual Studio Cloud Explorer