Data Analytics, AI/ML Flashcards
Vertex AI
AutoML
- Train ML models without code
AI Platform
- Train ML models using custom training code
Features:
- Data labeling (human assistance in labeling training data)
- Feature store (repo for ML features)
- Workbench (Jupyter notebook IDE)
Cloud TPU
Tensor Processing Units
Google designed circuits for training deep learning models
Dataflow
Serverless, batch and stream data processing service
Develop (Apache Beam) and execute (on Dataflow instances) ETL, batch and continuous computation (map and reduce) pipelines
Dataproc
Managed service for running Hadoop, Spark, Hive, and Pig jobs/clusters on Compute Engine VMs
Use Spark and Spark SQL for data analysis
Use Spark ML libraries to run classification algorithms
Analyze data stored in Cloud Storage
Cloud Workflows
Serverless orchestration platform that executes services based on YAML or JSON defined workflows
Workflows - combine steps for GCP API services, Cloud Functions, Cloud Run
*NOT for large volume of data or complex sequence of jobs
Data Fusion
Fully managed service based on CDAP for building ETL pipelines without code
*Pre-built connectors and transformations
Cloud Composer
Fully managed workflow orchestration service for Apache Airflow DAG workflows.
Open source - supports on prem and multicloud
Dataprep
Visually explore, clean, prepare structured and unstructured data for analysis
Dataproc
Runs Apache Spark and Hadoop clusters
Data Fusion
Data integration service to build and manage ETl/ELT pipelines
Preconfigured connectors and transformations
Cloud Composer
Workflow orchestration service to author, schedule, monitor pipelines that span clouds and on prem
Data Catalog
Metadata management
Google Data Studio
Visual analytics, interactive dashboards
Dataform
Develop data workflows in SQL and collaborate with Git.
Schedule data workflows with incremental updates to downstream datasets.
Define data quality checks and get alets.
**Works with BQ
Ingestion tools (5)
Pub/sub Storage Transfer Service Transfer Appliance Cloud IoT Core BQ