8. Model Training and Hyperparameter Tuning Flashcards

Question

What are the two types of Vertex AI training dataset?

Answer 1

No Managed Dataset: Cloud Storage and BigQuery Managed Dataset

Answer 2

Central location Easily create labels and multiple annotation sets Create tasks for human labeling Track lineage Compare model performance (AutoML vs custom models) Generate statistics and visualizations Automatically split data into different sets

Answer 3

Put setup.py (specify your libraries) in the root and task.py in the training folder. Upload your training code as Python source distribution to a Cloud Storage bucket before training.

Answer 4

It is a Docker image that you create to run your training application.

Answer 5

Faster start-up time (dependencies pre-installed) Use the ML framework and version of your choice Extended support for distributed training

Answer 6

Create a custom container and training file Build and run your Docker container Push the container image to Artifact Registry Start training by creating a custom job

Answer 7

Cluster: Vertex AI allocates a cluster of machines based on your specifications. Replica: Each running job on a node is called a replica. Worker Pool: A group of replicas with the same configuration is called a worker pool. Roles: Each replica has a role, like Primary (manages others) or Worker (performs training).

Answer 8

workPoolSpecs[0]: Master manages the others and reports status for the job workPoolSpecs[1]: Worker does its portion of work workPoolSpecs[2]: Parameter server store parameters to coordinate shared model status between workers Redaction server increase throughput and reduce latency. workPoolSpecs[3]: Evaluator to evaluate your model.

Answer 9

Hyperparameters heavily influence the behaviour of the learned model.

Answer 10

Hyperparameters are parameters of the training algorithm that are not learned directly from the training process.

Answer 11

Grid search (long): Exhaustively search through a manually specified set of hyperparameters Random search (doesn't use prior experiments): Randomly search from a set of combinations Bayesian search (use past evaluations): Use Gaussian Process Bandits

Answer 12

Use a simple validation set if you have a large data set. Use distributed training Pre-compute or cache the results of computations Decrease the number of hyperparameters for grid search

Answer 13

it works by running multiple trials of your training application with values for the hyperparameters you specify.

Answer 14

For a custom container, install the cloud-ml hypertune Python package in your Dockerfile. Add hyperparameter tuning code to the task.py file Build and push the container to Artifact Registry

Answer 15

Create a YAML file specifying a set of hyperparameters Run a shell command to create a custom job to start tuning

Answer 16

Optimize hyperparameters for neural network Optimize usability of an application Minimize computing resources for a job Optimize the amounts of ingredients

Answer 17

Use an interactive shell to run tracking and profiling tools, analyze GPU usage, and check GCP permissions available in the container.

Answer 18

It is a black-box optimization service that helps you tune hyperparameters.

Answer 19

Meet one of the following criteria: You don't have a known objective function to evaluate. It is too costly to evaluate by using the objective function It can also perform other optimization tasks, e.g., tune model parameters

Answer 20

Vertex AI Vizier is a built-in feature for hyperparameter tuning for custom training.

Answer 21

py-spy for visualizing Python execution (time spent) nvidia-smi and nvprof to monitor GPU utilization and to collect GPU profiling information. Perf analyze the performance of your training node (Linux profiling)

Answer 22

Monitor and optimize your model training performance by providing resource consumption of training operations.

Answer 23

Inspect AI Platform prediction models through an interactive dashboard.

Answer 24

It is a change in the statistical distribution of production data from the baseline data used to train or build the model.

Answer 25

the statistical properties of the target variable change over time

Answer 26

Vertex AI Model Monitoring

Answer 27

Periodic training Performance-based trigger Data changes trigger Retraining on demand

Answer 28

Unit test: Model output shape, output ranges, decrease in loss in a gradient step, make assertions, check label leakage Test updates in API call: Test retraining API call Test for algorithm correctness: Train for a few iterations and verify loss decreases, train without regularization (loss should be close to 0), test specific subcomputations of your algorithm

8. Model Training and Hyperparameter Tuning Flashcards

(53 cards)