10_Cloud Composer Flashcards

1
Q

Cloud Composer Overview

  • Cloud Composer is a fully managed Apache Airflow implementation
    • Infrastructure/OS handled for you
  • Apache Airflow programatically create, schedule and monitor data workflows
  • Automation and monitoring
  • Big Data pipelines are often multi-step, complex process:
    • Create resources in multiple services
    • Process and move data from one service to another
    • Remove resources when they complete a task
  • Collaborate workflow process with other team members
  • Built on open source, using Python as common language
  • Easy to work with, and share workflow with others
  • Works with non-GCP providers (on-premises, other clouds)
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cloud Composer Architecture

  • GKE cluster with Airflow implemented
  • Cloud Storage bucket for workflow files
  • And others … (hidden)
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Workflows

  • Orchestrate data pipelines:
    • Like a walkthrough of tasks to run
  • Format = Direct Acyclic Graph (DAG):
    • Written in Python
    • Collection of organized tasks that you want to schedule and run
  • Cloud Composer creates workflows using DAG files
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Process

  • Create Composer Environment
  • Set Composer variables (i.e. project ID, GCS bucket, region)
  • Add workflows (DAG files), which Composer will execute
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly