MLE Flashcards

1
Q

What are the two types of Quantization in TFX?

A

Post Training Quantization and Quantization Aware Training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Post Training Quantization?

A

Post-training quantization is a conversion technique that can reduce model size while also improving CPU and hardware accelerator latency, with little degradation in model accuracy. You can quantize an already-trained float TensorFlow model when you convert it to TensorFlow Lite format using the TensorFlow Lite Converter. It reduces the model size by integer casting, reduced float precision or using a dynamic range quantization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Quantization Aware Training?

A

Training that reduces the space of a model by reducing the bit precision of training weights in a neural network. It is more accurate than post training quantization but it is harder to use and requires full retraining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the four available options for model training on Vertex AI?

A

1.) AutoML
2.) Custom Training
3.) Model Garden
4.) Generative AI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is AutoML? When is it recommended?

A

AutoML is a no code solution for training tabular, image, text, or video data without preparing data splits. It is best for:
- Automatically tuning a model with some input data.
- Teams that have little to know coding experience
- Teams that want to quickly get a model running.
-Teams that do not want control of hyperparameter tuning aside from early stopping.
- Teams that are solving a problem in the defined problem types offered.
- Models served on an edge device or on google cloud.
- Models with latency greater than 100ms.
-

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is BigQueryML? When is it recommended?

A

BigQueryML is Google’s built in ML commands to create ML directly in BQ. It is recommended for:
- Those comfortable in SQL.
- Those with data already in BQ
- Those whose problems are covered by BQ’s model set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is custom training? When is it recommended?

A

Custom training is complete freedom to optimize all aspects of an ML pipeline. It is recommended for:
- Problems outside the scope of BQML and AutoML.
- Problems that are already written in code from another premesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are the three custom training methods on Vertex AI?

A

1.) Custom Jobs
2.) Hyper-parameter tuning jobs
3.) Training pipelines

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are custom training custom jobs?

A

A basic way to run a custom machine learning model on Vertex AI. It needs a pre-built or custom container to run in.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are custom training hyperparameter tuning jobs?

A

This runs multiple trials of custom jobs to tune hyperparameters. It requires, a metric to evaluate performance against, a maximum number of trials to perform, a maximum number of parallel trials, the maximum number of jobs that can fail, the machine type and any accelerators (GPUs/TPUs it uses), the custom or pre-built container information it is using.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a custom training, training pipeline?

A

A training pipeline can run a custom job or hyperparameter tuning job and outputs your model to a google cloud storage bucket.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What frameworks have pre-built containers for training?

A

TensorFlow, XGBoost, Scikit-Learn, Pytorch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is model garden?

A

Model Garden in the Google Cloud console is an ML model library that helps you discover, test, customize, and deploy Google proprietary and select OSS models and assets. Many of these are pretriained and allow fine tuning/ transfer learning to customize to a nearby solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is fine tuning and transfer learning? When should one be used over the other?

A

Transfer learning is the process of retraining the final layers of a pre-trained model. Fine tuning is an extension of transfer learning that allows retraining of the weights of a model as well. Fine tuning is recommended for larger datasets while transfer learning is for smaller ones as it is more likely to overfit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the AutoML workflow?

A

1.) Prepare your training data.
2.) Create a dataset.
3.) Train a model.
4.) Evaluate and iterate on your model.
5.) Get predictions from your model.
6.) Interpret prediction results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are best practices for ML Environment Setup in custom training?

A

1.) Use Vertex AI Workbench notebooks for experimentation and development.
2.) Create notebook instances for each team member.
3.) Store ML resources the same just like datasets with IAM permisisons.
4.) Use Vertex AI SDK for Python.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are best practices for ML Development in custom training?

A

1.) Store structured and semi-structured data in BigQuery.
2.) Store image, video, audio and unstructured data on Cloud Storage.
3.) Use Vertex AI Data Labeling for unstructured data.
4.) Use Vertex AI Feature Store with structured data.
5.) Avoid storing data in block storage.
6.) Use Vertex AI TensorBoard and Vertex AI Experiments for analyzing experiments.
7.) Train a model within a notebook instance for small datasets.
8.) Maximize your model’s predictive accuracy with hyperparameter tuning.
9.) Use feature attributions (importances) to gain insights into model predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are best practices for Data Processing in custom training?

A
  1. ) Use BigQuery to process structured and semi-structured data or if data is in BQ already.
    2.) Use Dataflow to process data.
    3.) Use Dataproc for serverless Spark data processing.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are best practices for operationalized training in custom training?

A

1.) Run code in a managed service like Vertex AI training (container based solutions with task.py file) or Vertex AI pipelines.
2.) Operationalize job execution with training pipelines.
3.) Use training checkpoints to save the current state of your experiment.
4.) Prepare model artifacts for serving in Cloud Storage.
5.) Regularly compute new feature values and push them to feature store.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is operationalized training?

A

Operationalized training refers to the process of making model training repeatable, tracking repetitions, and managing performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is Dataproc?

A

A managed Apache Spark/Hadoop service that allows batch processing, querying, streaming and ML.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is Dataflow?

A

Data flow is a serverless service built on Apache Beam for setting up automated data processing pipelines. It can be used with TFX and Kubeflow Pipelines as they have integrated DataFlow runners. Since Vertex AI Pipelines support both, it can also be used there.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is Vertex AI TensorBoard?

A

A tool for measuring and visualizing aspects of a TF ML workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is a Vertex AI Managed Dataset?

A

Vertex AI offers a central repo for datasets which can be used for AutoML and custom models on Vertex AI. It accepts Image, Tabular, Text and Video data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What file formats are should be used for model artifacts from a Vertex AI pre-built container?

A

1.) TensorFlow: saved_model.pb
2.) Scikit-Learn: model.joblib or model.pkl
3.) XGBoost: model.bst
4.) PytTorch: model.pth

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What are best practices for model deployment and serving in custom training?

A

1.) Specify the number and type of machines you need?
2.) Plan inputs to the model using batch or online serving techniques.
3.) Turn on auto-scaling by defining the minimum and maximum nodes with a bare minimum of 2 nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is batch prediction?

A

Prediction of batches of data brought in at a regular interval. Requests are asynchronous and come directly from the model. Requires an input source and output location of either GCS or BigQuery. Can also be done by reading batch features with the Feature Store API but this would be slower as features would need to be ingested.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

What are ML workflow orchestration best practices?

A

1.) Use Vertex AI pipelines for running DAGS created by Kubeflow, TFX and Airflow.
2.) Use Kubeflow pipelines to author your pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What are recommended Artifact Organization best practices?

A

1.) Organize ML artifacts.
2.) Use version control for pipeline and custom component code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What artifacts should be stored in the source control repo?

A
  • Vertex AI Workbench notebooks
  • Pipeline source code
  • Preprocessing functions
  • Model source code
  • Model training packages
  • Serving functions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is an Artifact?

A

An artifact is the output resulting from each step of a ML workflow.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What artifacts should be stored in Experiments and ML Metadata?

A
  • Experiments
  • Hyperparameters
  • Metaparameters
  • Metrics
  • Data Artifacts
  • Model Artifacts
  • Pipeline Metadata
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

What artifacts should be stored in Vertex AI Model Registry?

A

Trained Models from AutoML, Custom training or BigQueryML. They can be versioned

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What artifacts should be stored in the Artifact Registry?

A
  • Pipeline containers
  • Custom training environments
  • Custom prediction environments
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What artifacts should be stored in Vertex AI Prediction?

A

Deployed models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What are best practices for model monitoring?

A

1.) Use drift and skew detection at an endpoint. It uses TFDV under the hood to determine data drift and skew.
2.) Fine tune alert thresholds.
3.) Use feature attributions as an early warning sign for data drift or skew through Vertex Explainable AI.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

What is data skew?

A

The degree of distortion between your training data and production data?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

What is data drift?

A

The process at which data drifts over time changing the underlying statistical distribution of inputs and target.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What is an online prediction?

A

A synchronous request made to a model endpoint for serving predictions with low latency and/or streaming data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

What are the guidelines for experimentation?

A

1.) Have fixed thresholds for optimizing metrics and satisficing metrics like latency and model size.
2.) Implement an evaluation routine that is model indifferent.
3.) Ensure you have a baseline model to compare against.
4.) Track every experiment and incremental improvement.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the guidelines for data quality?

A

1.) Address class imbalance early.
2.) Automate data preprocessing.
3.) Prevent data leakage with a test-train split that isolates test data from the tuning process.
4.) Generate a data schema that includes feature statistics.
5.) Ensure training data is properly shuffled in batches.
6.) Use a validation set for model/hyperparameter tuning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What are the guidelines for model quality?

A

1.) For DNN’s, monitor for NaN values in loss and percentage of weights as this can indicate errors or vanishing/exploding gradients.
2.) Use validation and test data to check for overfitting/underfitting
3.) Analyze misclassified instances to check for mislabeling, outliers or pre-processing that is needed.
4.) Analyze feature importance and remove those that have little importance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

What are the guidelines for data validation?

A

1.) Verify features match the expected schema.
2.) Verify data is in expected ranges and distributions.
3.) Validate the maximum fraction of missing values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What are the guidelines for model validation?

A

1.) Validate the model on unseen test data.
2.) Ensure the test data is representative of the data and that time series test data is fresher than train.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

What are guidelines for model deployment?

A

1.) Verify the model can be called.
2.) Validate satisficing requirements.
3.) Unit test the model for edge cases and typical cases.
4.) Test in a staging environment where you can roll back to a previous version if needed.
5.) Use A/B or multi-armed bandit testing before fully rolling out a new model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What are guidelines for model serving?

A

1.) Regularly profile request data for tracking data drift or skew and set alerts for skew/drift thresholds.
2.) Identify concept drift by checking how feature importance changes over time.
3.) Determine outliers with respect to the training data.
4.) Perform continuous evaluation where true labels are available.
5.) Monitor service efficiency.
6.) Monitor predictive performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What are the 3 continuous parts of MLOps?

A

1.) Integration (CI): Testing and validating code, components, data, schmas and models.
2.) Delivery (CD): Deployment of an end-to-end pipeline that automatically pushes to a prediction service.
3.) Training (CT): Models are automatically retrained and served as they improve.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

What is MLOps maturity level 0?

A

Completely manual process that is script driven and experimental. It has no CI, CD or CT.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is MLOps maturity level 1?

A

A step up from level 0 with CT integration. It needs automated data and model validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is MLOps maturity level 2?

A

It includes integration of CI/CD on top of CT for rapid automation of pipeline experimentation and integration. It requires source control, test/build services, deployment services, model registry, feature store, ml metadata store and ml pipeline orchestration which are all automated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is Vertex AI Pipelines?

A

Vertex AI pipelines is a managed resource for MLOps. It supports both KubeFlow Pipelines and Tensor Flow Extended frameworks but it manages the compute cluster for you in a containerized environment.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

When should TFX be used?

A

When the pipeline you are creating is running tensorflow code.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

When should Kubeflow Pipelines be used?

A

When not using a tensorflow code or off premises/multicloud solutions are needed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

How can BigQueryML serve models?

A

BQML can natively serve batch predictions but by integrating into Vertex AI it can be deployed to an endpoint through Model Registry and perform online prediction. This deployment does not work for ARIMA+ or XGBoost models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What is Vertex AI Vizier?

A

A black-box optimization service that helps tune hyperparameters. Done when the objective/loss function function is unknown or too costly to use. By default it uses Bayesian Optimization but can also use Grid Search, Random Search or an unspecified mode that chooses a best solution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

What is Neural Architect Search?

A

The process by which AutoML searches for and finds the best model architecture for a given problem and tunes its parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

What is Dataprep?

A

An intelligent cloud data service to visually explore, clean, and prepare data for analysis and machine learning. IT auto detects 17 different data types and can transform structured or unstructured data in CSV, JSON or relational tables up to petabytes.

55
Q

What are the 3 connections for Dataprep?

A

1.) Direct Upload/Download
2.) GCS
3.) BigQuery

56
Q

What Tabular Data Problems can AutoML solve?

A

Classification/Regression, Forecasting

57
Q

What Text Problems can AutoML solve?

A

Classification, entity extraction, sentiment analysis

58
Q

What Video Problems can AutoML solve?

A

Action Recognition, classification, object tracking.

59
Q

What Image Problems can AutoML solve?

A

Classification, Object Detection

60
Q

What is the CLUSTER_SPEC/TF_CONFIG?

A

A Vertex AI environment variables specifying the cluster used for running a distributed training job in Vertex AI/Tensorflow. They need a primary replica/chief which manages the cluster, workers which perform the training, parameter servers (if using ParameterServerStrategy) or evaluators. CLUSTER_SPEC is set for the full cluster and TF_CONFIG is set on each replica of a training job for multiple replica jobs.

61
Q

What is data parallelism?

A

Data is split and used to train different models. The overall model is updated asynchronously (allreduce) or synchronously (Parameter Serving)

62
Q

What is model parallelism?

A

For large models, weights are split across multiple devices and each device trains part of a model.

63
Q

What is MirroredStrategy?

A

A simple asynchronous training strategy where multiple GPUs can be used on one machine. IT creates one replica per GPU and trains. The results are allreduced together at each update step.

64
Q

What is MultiWorkerMirroredStrategy?

A

An asynchronous strategy that scales MirroredStrategy horizontally by replicating jobs across multiple workers/machines. It requires the TF_CONFIG variable to work

65
Q

What is TPUStrategy?

A

A distributed training strategy that uses TPUs to implement the MirroredStrategy.

66
Q

What is ParameterServerStrategy?

A

A synchronous model training strategy that uses multiple machines. The parameter server is a cental co-ordinator that save checkpoints, distributes data and updates weights from workers as they are input. It requires the TF_CONFIG variable and TFConfigClusterResolver to define cluster organization.

67
Q

What is a TPU?

A

a Tensor Processing Unit that can perfrom allreduce based asynchronous training. It can cause “data bottlenecks” if data size is not properly considered as they are extremely fast. Therefor this requires a balance of file number vs file size to avoid network overhead. Can only read from GCS.

68
Q

What is CentralStorageStrategy?

A

An asynchronous training strategy that does not mirror variables and all operations are replicated across local GPUs.

69
Q

When should a TPU be used?

A
  • Model is dominated by matrix computations
  • Model has no custom TF/PyTorch/JAX operations in main training loop.
  • Model trains for weeks or more.
  • Model is large and has effective batch sizes.
70
Q

When should a TPU not be used?

A
  • Linear Algebra is frequently branching or has element wise operations.
  • Workloads access memory in a sparse manner.
  • Workload requires high-precision arithmetic.
  • Neural Network ahs custom training operations in the main training loop.
71
Q

Precison vs Recall?

A

Precision measures fraction of relevant positives over retrieved positives (TP/(TP + FP)) while recall measures the fraction of relevant positives over the number of expected positives (TP/(TP + FN)). F1 seeks to balance with the harmonic mean of both.

72
Q

What is GenAI Studio?

A

A vertex AI offering that allows you to quickly test and customize language, vison and speech models.

73
Q

What vertical AI solutions are available from Google?

A

1.) AI for healthcare: Generates healthcare analytics
2.) Discovery AI for retail

74
Q

What horizontal AI solutions are available from Google?

A

1.) Contact Center AI / Dialogflow
2.) Document AI

75
Q

What is Vertex AI FeatureStore?

A

IT is a managed service that streamlines ML feature management. It act as a layer between BQ data that serves the latest features at low latency. It registers multiple BQ tables or views and serves the freshest data based on timestamp.

76
Q

What is TFX?

A

TFX is an extension of tensor flow that allows the build of ML pipelines for a production environment. It is supported by Vertex AI pipelines to allow cloud native pipeline operation.

77
Q

What standard components are available to TFX?

A

1.) ExmpleGen - Ingests and optionally splits data
2.) StatisticsGen - Calculates statistics on a dataset.
3.) SchemaGen - Examines statistics and creates a schema.
4.) ExampleValidator - Uses schema and statistics to find anomalies + missing values.
5.) Transform - Performs feature engineering
6.) Trainer - Trains the model
7.) Tuner - Tunes Hyperparameters
8.) Evaluator - Performs deep analysis of the training results
9.) InfaValidator - Checks if the model is servable
10.) Pusher - Deploys the model to serving infrastructure
11.) BulkInferrer - Performs batch predictions with a trained model.

78
Q

What orchestrators can TFX use?

A

Airflow, Kubeflow or Vertex AI Pipelines.

79
Q

What is the recommended method for training with Vertex AI containers?

A

Have a setup.py in your root directory containing the requirements of the program. Have a trainer/ folder containing task.py that is the entry point to evoke the model.py file. Have an __init__.py file in every sub directory to make the module a package.

80
Q

How to perform feature engineering using BQML

A

Use the TRANSFORM clause. This can be used for general imputation, numerical normalization, scaling and bucketing, categorical encoding/crossing, text tokenization/vectorization and image manipulation.

81
Q

What automatic transformations are done by BQML when calling CREATE MODEL?

A

BQ automatically performs imputation, numeric standardization (most models), one-hot encoding, multi-hot encoding (arrays), Timestamp transformation and struct expansion.

82
Q

How are predictions generated in BQML?

A

Using ML.PREDICT, ML.FORCAST, ML.RECOMMEND, ML.DETECT_ANOMALIES

83
Q

What problems can the Natural Language API solve?

A

It can determine test entity types, analyze sentiment, annotate text (all features), classify text and moderate text. All to pre-trained set of solutions.

84
Q

What problems can Speech-to-Text API solve?

A

It can perform synchronous, asynchronous or real time transcription of specified language speech.

85
Q

What problems can Text-to-Speech solve?

A

Take written text and convert it to speech with a pre-set list of voices.

86
Q

What problems can Translation API solve?

A

Allows dynamically translated text. Cloud Translation uses a Google pre-trained or a custom machine learning model to translate text with 100+ language pairs.

87
Q

What problems can Vision API solve?

A

Cloud Vision allows developers to easily integrate vision detection features within applications, including image labeling, face and landmark detection, optical character recognition (OCR), and tagging of explicit content.

88
Q

What problems can Video Intelligence API solve?

A

Stored video analysis, streaming video analysis,
object detection and tracking, logo recognition, face detection, person detection, video annotation.

89
Q

What problems can Document AI solve?

A

Digitizing documents for e-readers, optical character recognition, image recognition, entity extraction, NLP, document classification, key-value pair recognition, translation, normaliztion.

90
Q

What types of processors does Document AI have?

A

General, specialized, custom.

91
Q

When should L1 regularization be used?

A

When feature selection is needed along with regularization to prevent overfitting.

92
Q

When should L2 regularization be used?

A

When features are collinear/co-dependent and so removing with L1 could be detrimental.

93
Q

What is regularization used for?

A

To prevent overfitting.

94
Q

What is the tf.data.Dataset?

A

An efficient API for TF import of various data types.

95
Q

What import methods does tf.data.Dataset have?

A

1.) TextLineDataset - import lines from text files.
2.) TFRecordDataset - import TF record data
3.) FixedLengthRecordDataset - import records from binary files.

96
Q

What is Retail API?

A

End to end service for building customer recommendation systems.

97
Q

What is cloud Healthcare API?

A

A system for storing, synthesizing, de-identifying and analyzing healthcare data. Supports DICOM (digital imaging and communications in medicine), HL7v2 (event messaging service) and FHIR (Fast healthcare interoperability sources)

98
Q

What is contact center AI

A

A system built on Dialogflow for building managed AI chat bots.

99
Q

What are Tabular Workflows?

A

Tabular Workflow for End-to-End AutoML is a complete AutoML pipeline for classification and regression tasks. It is similar to the AutoML API, but allows you to choose what to control and what to automate. Instead of having controls for the whole pipeline, you have controls for every step in the pipeline. These pipeline controls include:

Data splitting
Feature engineering
Architecture search
Model training
Model ensembling
Model distillation

100
Q

What is the cold start problem?

A

The cold start problem occurs when the recommender system lacks sufficient information to make reliable predictions or suggestions for a user or an item.

101
Q

What data can suffer from random splitting?

A

1.) Time series data
2.) Data groupings
3.) Burst data

102
Q

What is training-serving skew/ data skew?

A

A mismatch in the data that was trained vs used for prediction. Often due to processing issues, poor assumptions or sampling issues.

103
Q

What is data drift?

A

Change in the statistical properties of data over time.

104
Q

What benefits do Vertex AI managed datasets give?

A
  • Manage your datasets in a central location.
  • Easily create labels and multiple annotation sets.
  • Create tasks for human labeling using integrated data labeling.
  • Track lineage to models for governance and iterative development.
  • Compare model performance by training AutoML and custom models using the same datasets.
  • Generate data statistics and visualizations.
  • Automatically split data into training, test, and validation sets.
105
Q

What benefits does Feature Store give?

A

Store and maintain your offline feature data in BigQuery, taking advantage of the data management capabilities of BigQuery.

Share and reuse features by adding them to the feature registry.

Serve features for online predictions at low latencies using Bigtable online serving or at ultra-low latencies using Optimized online serving.

Store embeddings in your feature data and perform vector similarity searches.

Track feature metadata in Dataplex.

106
Q

Difference between Prophet and ARIMA+

A

Like BigQuery ML ARIMA_PLUS, Prophet attempts to decompose each time series into trends, seasons, and holidays, producing a forecast using the aggregation of these models’ predictions. An important difference, however, is that BQML ARIMA+ uses ARIMA to model the trend component, while Prophet attempts to fit a curve using a piecewise logistic or linear model.

107
Q

Google recommendations for Video Data

A
  • At most 100 times move videos of the most common label to the least common one.
  • 100 or more training video frames per label are recommended.
  • For video frame resolution much larger than 1024 pixels by 1024 pixels, some image quality may be lost during the frame normalization process used by Vertex AI.
108
Q

When to use Colab and when to use Workbench (Jupyter based)?

A

Colab: Project priorities are collaboration, experimentation, and avoiding spending time setting up infrastructure.

Workbench: Priorities are control and customization and pipeline development.

109
Q

What is TabNet?

A

TabNet uses sequential attention to choose which features to reason from at each decision step. This promotes interpretability and more efficient learning because the learning capacity is used for the most salient features. It trains classification and regression models.

110
Q

What is Wide and Deep?

A

Wide & Deep jointly trains wide linear models and deep neural networks. It combines the benefits of memorization and generalization.

111
Q

What is a workbench kernel?

A

A kernel hosts a jupyter notebook session.

112
Q

How can Dataproc be integrated to workbench?

A

By setting up acluser and running the a notebook in it. Requires Dataproc Worker (roles/dataproc.worker) on your project
Dataproc Editor (roles/dataproc.editor) on the cluster for the dataproc.clusters.use permission.

113
Q

How can other ML frameworks be trained or served?

A
114
Q

How can spark models be served?

A

using a custom container.

115
Q

How should an A/B test be run?

A

With a single hypothesis to 95% significance using a load balancer.

116
Q

What is shadow testing?

A

Running a production version and a mirror of it that has all requests replayed in it. Helps to check performance without taking affecting customers. It does however cost more and needs to be handled carefully to avoid bugs like overcharging, etc.

117
Q

What is canary testing?

A

gradual rollout of a feature as performance is evluated in situ. It is however slow and needs substantial observability/monitoring.

118
Q

What is a vector search used for?

A

By semantically embedding data you can map data to somantically similar groups, descriptions, queries or images.

119
Q

What is Vertx AI Metadata?

A

Vertex ML Metadata lets you:

Analyze runs of a production ML system to understand changes in the quality of predictions.

Analyze ML experiments to compare the effectiveness of different sets of hyperparameters.

Track the lineage of ML artifacts, for example datasets and models, to understand just what contributed to the creation of an artifact or how that artifact was used to create descendant artifacts.

Rerun an ML workflow with the same artifacts and parameters.

Track the downstream usage of ML artifacts for governance purposes.

120
Q

How to make a batch prediction?

A

You request a batchPredictionsJob directly from the model resource without needing to deploy the model to an endpoint.

121
Q

How to make an online prediction?

A

Before sending a request, you must first deploy the model resource to an endpoint. This associates compute resources with the model so that it can serve online predictions with low latency.

122
Q

How can vanishing gradients be avoided?

A

Avoid sigmoid for internal laters and use ReLu or advanced ReLU nonlinearities.

123
Q

How to solve exploding gradients?

A

Use gradient clipping. (clipnorm/clipvalue)

124
Q

When to use manual training?

A

When your team manage only a few models, are still experimenting or the models are modified infrequently.

125
Q

What are Google’s recommended practices for human centered AI design?

A

1.) Design features with disclosures built in.
2.) Consider giving a few answers and let the user decide.
3.) Model potential adverse feedback and have a iterative roll out plan.
4.) Engauge with a diverse set of users and use feedback to guide further development.

126
Q

What are Google’s recommended practices for training and monitoring of AI?

A

1.) Use metrics form users feedback as well as product performance (click through, customer use sliced across different groups.
2.) Ensure metrics are appropriate for the context.

127
Q

What are Google’s recommended practices for raw data examination of AI?

A

1.) Check data for mistakes, accuracy, bias and representation.
2.) Check for training-serving skew.
3.) Remove redundant features.

128
Q

What are Google’s recommended practices for understanding limitations of AI?

A

1.) Correlation does not equal cauation.
2.) Models are a reflection of the training data.
3.) Communicate limitations where possible.

129
Q

What are Google’s recommended practices for testing AI?

A

1.) Rigourously test each component in isolation.
2.) Conduct integration testing.
3.) Detect input drift.
4.) Use a gold standard test set to ensure models work consistently.
5.) Build quality checks such that unintended failures trigger an immediate response.

130
Q

What can feature attribution (feature importance) help detect?

A

Training-Serving Skew: Attribution differs in production than in training.

Drift: Attribution changes over time.

This is done on an online prediction endpoint in Vertex AI for Tabular data.

131
Q

What is continuous evaluation?

A

Periodic evaluation of your model on new, incoming data to determine if model performance is degrading.

132
Q

What are the steps to building a pipeline?

A

1.) Set up the GCP environment. Determine if using TFX or Kubeflow.
2.) Design the pipeline around the model you plan to execute.
3.) If using KFP compile to a .yaml file.
4.) run the pipeline (kfp = job.submit(), tfx = tfx.orchestration.{runner}.run()

133
Q

What is vertex AI pipelines?

A

Vertex AI Pipelines is a Google Cloud managed service that allows you to orchestrate and automate ML pipelines where each component of the pipeline can run containerised on Google Cloud or other cloud platforms.

134
Q

What does vertex AI pipelines include?

A
  • A user interface for managing and tracking experiments, jobs, and runs.
  • An engine for scheduling multistep ML workflows.
  • A Python SDK for defining and manipulating pipelines and components.
  • Integration with [Vertex ML Metadata] to save information about executions, models, datasets, and other artifacts.
135
Q

What are 3 publishers that can trigger a vertex AI pipelines job through Pub/Sub

A
  • Cloud Scheduler is publishing messages on a schedule and therefore triggering the pipeline.
  • Cloud Composer is publishing messages as part of a larger workflow, for example a data ingestion workflow that triggers the training pipeline after new data are ingested in BigQuery.
  • Cloud Logging publishes a message based on logs that meet some filtering criteria. You can set up the filters to detect the arrival of new data or even skew and drift alerts generated by the Vertex AI Model Monitoring service.

Note: This can also be done with the scheduler api.

136
Q

How does cloud build accomplish pipeline CI/CD?

A

Cloud build can be automatically or manually triggered to clone your repo, run unit and integration tests, build ML images (containers), compile the pipeline, upload to artifact registry and run.

137
Q

What does vertex AI metadata help you do?

A
  • Analyze runs of a production ML system to understand changes in the quality of predictions.
  • Analyze ML experiments to compare the effectiveness of different sets of hyperparameters.
  • Track the lineage of ML artifacts, for example datasets and models, to understand just what contributed to the creation of an artifact or how that artifact was used to create descendant artifacts.
  • Rerun an ML workflow with the same artifacts and parameters.
  • Track the downstream usage of ML artifacts for governance purposes.
138
Q

What monitoring should Tensorboard be used for?

A

Monitoring experiments.

139
Q
A