Processing Flashcards
AppEngine
Platform as a service
No direct machine control
Use cases:
Applications w/ managed scaling, consistent or spiking traffic
Classic websites, app servers, etc.
GCP Compute
Infrastructure as a service - direct VM control, highly flexible.
Use cases:
Applications incompatible with AppEngine or requiring direct machine control, e.g.
Non-Kubernetes containerised solutions
Migration of existing solution to cloud
Custom kernel or arbitrary OS needed
Solution requires use of GPUs
Dataproc
No-ops Hadoop cluster with fast creation / destruction, auto setup, resize, and balance.
Use cases: Hadoop systems (migrating existing Hadoop systems or creating job oriented Hadoop clusters for BI)
Dataprep
Visual ETL workflow, compatible with JSON, CSV and relational databases.
Use cases:
Explore, clean and prep data for analysis
Especially for use with BigQuery, Dataproc jobs, etc.
Data flow
Hadoop + Apache beam ETL pipelines with noOps Hadoop execution
Use cases:
Complex ETL with scaleable, fault tolerant pipelines e.g.
Pushing data to multiple storage locations
Merging stream data
Calling external data enrichment
Providing a real time cache of steam data
Pub/Sub
Platform as a service (no ops) message delivery. Decouples communication in multi part systems.
Use cases:
Collecting data for processing / storage
Intro-system messaging
Cloud Functions
Functions as a service (no ops) - Node.js execution triggered by events (can be Pub/Sub, data flow, Google Storage bucket or HTTP API).
Use cases:
Serverless function execution
E.g. no ops mobile / IoT back ends, or real time ETL for stream data