10_Cloud Composer Flashcards
1
Q
Cloud Composer Overview
- Cloud Composer is a fully managed Apache Airflow implementation
- Infrastructure/OS handled for you
- Apache Airflow programatically create, schedule and monitor data workflows
- Automation and monitoring
- Big Data pipelines are often multi-step, complex process:
- Create resources in multiple services
- Process and move data from one service to another
- Remove resources when they complete a task
- Collaborate workflow process with other team members
- Built on open source, using Python as common language
- Easy to work with, and share workflow with others
- Works with non-GCP providers (on-premises, other clouds)
A
2
Q
Cloud Composer Architecture
- GKE cluster with Airflow implemented
- Cloud Storage bucket for workflow files
- And others … (hidden)
A
3
Q
Workflows
- Orchestrate data pipelines:
- Like a walkthrough of tasks to run
- Format = Direct Acyclic Graph (DAG):
- Written in Python
- Collection of organized tasks that you want to schedule and run
- Cloud Composer creates workflows using DAG files
A
4
Q
Process
- Create Composer Environment
- Set Composer variables (i.e. project ID, GCS bucket, region)
- Add workflows (DAG files), which Composer will execute
A