Azure Data Factory Flashcards
What does Azure Data Factory Do?
Enables data to be moved from internal or external services and application, transformed and stored in a final location.
What does a typical orchestration look like for Azure Data Factory?
Dataset -> PipeLine -> OutputData -> LinkedService -> (Azure data lake, Block Storage, SQL)
What is an Azure Data Factory Linked Service
Contains information needed to conect to external data sources (Like a SQL data connection string).
What is an Azure Data Factory Gateway?
- Connect your on-prem to Azure Cloud.
- It consists of a client agent thet is installed on-prem and then connects to Azure Data Factory.
What does Azure Data Factory help us perform?
Orchestrating the moving, transforming, and loading of data.
What methods can we use to build the Azure Data Factory Pipelines
CLI
API
Powershell
Ci/CD
Portal
What is an Azure Data Factory Pipeline?
It is a series of tasks like copying, transforming, and storing the data.
I want a code method to create Azure Data Factory Data Pipelines. What are my options, and explain?
Use a CiCD such as GitHub or Azure DevOps to hold the code to create your Azure Data Factory Pipeline.
How would you connect with an on-prem SQL in Azure Data Factory?
A linked service can connect with an SQL Database, a supported type.
How would you connect with an on-prem SFTP using Azure Data Factory?
Use a link service in Azure Data Factory; it supports SQL.
How would you connect with a CosmoDB database using Azure Data Factory?
Use a link service in Azure Data Factory; it supports CosmoDB.
How would you connect with an REST API in Azure Data Factory?
Use a link service in Azure Data Factory; it supports REST API.
List the supported link service types for Azure Data Factory?
Azure Blob Storage
Azure Data Lake Storage Gen1 and Gen2
Azure SQL Database
Azure Synapse Analytics (formerly SQL Data Warehouse)
Azure Cosmos DB
Amazon S3
Amazon Redshift
Google BigQuery
Oracle Database
SQL Server
MySQL
PostgreSQL
SAP HANA
Salesforce
REST
SFTP
File System
Describe a linked service in Azure Data Factory.
It connects to external data like file systems, SQL, and SAP and is used to pull datasets into Azure Data Factory.
What can we use to trigger a pipeline in Azure Data Factory?
Schedule Trigger: This allows you to run pipelines on a recurring schedule.
Tumbling Window Trigger: Useful for time-based workflows, executing pipelines at periodic time intervals.
Event-based Trigger: Responds to events, such as file creation or deletion in Azure Blob Storage.
Manual Trigger: Allows you to start a pipeline run on-demand.
Custom Events Trigger: Reacts to custom events published to an Azure Event Grid topic.
Storage Event Trigger: Responds to specific Azure Blob Storage or Azure Data Lake Storage Gen2 events.
REST API: Programmatically trigger pipelines using the ADF REST API.
PowerShell: Use Azure PowerShell cmdlets to trigger pipeline runs.
Azure CLI: Trigger pipelines using Azure Command-Line Interface commands.