Data Analytics, AI/ML Flashcards

1
Q

Vertex AI

A

AutoML
- Train ML models without code

AI Platform
- Train ML models using custom training code

Features:

  • Data labeling (human assistance in labeling training data)
  • Feature store (repo for ML features)
  • Workbench (Jupyter notebook IDE)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Cloud TPU

A

Tensor Processing Units

Google designed circuits for training deep learning models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Dataflow

A

Serverless, batch and stream data processing service
Develop (Apache Beam) and execute (on Dataflow instances) ETL, batch and continuous computation (map and reduce) pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Dataproc

A

Managed service for running Hadoop, Spark, Hive, and Pig jobs/clusters on Compute Engine VMs

Use Spark and Spark SQL for data analysis
Use Spark ML libraries to run classification algorithms
Analyze data stored in Cloud Storage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Cloud Workflows

A

Serverless orchestration platform that executes services based on YAML or JSON defined workflows
Workflows - combine steps for GCP API services, Cloud Functions, Cloud Run

*NOT for large volume of data or complex sequence of jobs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Data Fusion

A

Fully managed service based on CDAP for building ETL pipelines without code

*Pre-built connectors and transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cloud Composer

A

Fully managed workflow orchestration service for Apache Airflow DAG workflows.

Open source - supports on prem and multicloud

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Dataprep

A

Visually explore, clean, prepare structured and unstructured data for analysis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Dataproc

A

Runs Apache Spark and Hadoop clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Data Fusion

A

Data integration service to build and manage ETl/ELT pipelines
Preconfigured connectors and transformations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Cloud Composer

A

Workflow orchestration service to author, schedule, monitor pipelines that span clouds and on prem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Data Catalog

A

Metadata management

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Google Data Studio

A

Visual analytics, interactive dashboards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Dataform

A

Develop data workflows in SQL and collaborate with Git.

Schedule data workflows with incremental updates to downstream datasets.

Define data quality checks and get alets.

**Works with BQ

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Ingestion tools (5)

A
Pub/sub
Storage Transfer Service
Transfer Appliance
Cloud IoT Core
BQ
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Data analytics storage (4)

A

Cloud Storage
Bigtable
Memorystore
BQ

17
Q

Services for ingesting data from other clouds (3)

A

Cloud Data Fusion
Storage Transfer Service
BQ Transfer Service

18
Q

Services for ingesting data from on prem

A

Data Fusion + Connector - for low-code, graphical UI

Transfer Appliance or Storage Transfer Service - large volumes

19
Q

Recommended method for ingesting batch workloads

A

gsutil or STS to ingest into Cloud Storage

20
Q

Services to ingest data via streaming

A

Pub/Sub - global, low latency

BQ - for analytics and reporting

Apache Kafka on prem or other clouds - Kafka to BQ Dataflow template

21
Q

Service to use to ingest data from multiple sources

A

Dataflow

22
Q

Cloud Data Loss Prevention

A

Service to inspect and transform structured and unstructured data from anywhere in Google Cloud
Classify, mask, tokenize sensitive info.
Scan BQ data
De-identify and re-identify PII in large data sets

23
Q

Smart Analytics suite consists of (3)

A

BQ
Data Studio
Cloud Composer

24
Q

Ways to send data to Cloud Logging

A

App Engine - auto record data to Cloud Logging

Logging agent

Custom logging messages to stdout and stderr

25
Q

Ways to ingest app data into GCP for analysis

A

Write data to file - store in Cloud Storage - BQ import function

Write data to database - Cloud SQL, Bigtable, Firestore/Datastore

Stream data via pub/sub

26
Q

Datastream

A

Captures change data from Oracle, MySQL, others

Data replicate using Dataflow templates to create replicated table in BQ

27
Q

Services for processing data (3)

A

Dataproc
Dataprep
Dataflow

28
Q

Data analytics services spanning pipeline (4)

A

Data Fusion
Data Catalog
Cloud Composer
Datastream

29
Q

Kubeflow

A

Library and tools for ML workflow deployment in K8

30
Q

TensorFlow Enterprise

A

Development environment for ML

31
Q

AI Platform Prediction

A

serverless, autoscaling service to host ML models

Used to serve trained models for online inference