Scaling prototypes into ML models Flashcards

1
Q

What percentage of the codebase in production ML systems is typically devoted to ML model code, and why is it relatively small?

A

ML model code accounts for only about 5% of the overall codebase in production ML systems. This is because:

1) Production systems require extensive components beyond model inference, including data ingestion, preprocessing, serving, monitoring, and maintenance pipelines.

2) Ensuring scalability, fault tolerance, and deployment reliability often involves complex engineering tasks unrelated to the core model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Outline the steps in the ML workflow from data extraction to production deployment, and identify tools used for each step.

A

1) Data Extraction: Retrieve data from sources (e.g., CRM systems, streaming sensors).
Tools: BigQuery, Apache Beam.

2) Data Analysis: Perform EDA to identify trends, anomalies, and correlations.
Tools: Pandas, Data Studio, BigQuery ML.

3) Data Preparation: Transform raw data into structured formats and engineer features.
Tools: SQL, BigQuery ML.

4) Model Training: Train models using prepared datasets.
Tools: Vertex AI, TensorFlow, PyTorch.

5) Model Validation: Evaluate models against business metrics and test set performance.
Tools: Vertex AI Pipelines, ML.EVALUATE.

6) Deployment: Deploy the validated model to production for online or batch predictions.
Tools: Vertex AI Endpoints, AI Platform Prediction.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the role of data distribution analysis in debugging ML models?

A

Data distribution analysis helps identify changes in input data that may affect model performance. For example:

1) Detecting Schema Changes: Identifies when categorical features are remapped or missing.

2) Identifying Skew: Flags mismatches between training and serving distributions.

3) Preventing Silent Failures: Recognizes when valid-looking inputs no longer align with model expectations.

Tools like Vertex AI Monitoring automate the detection of such anomalies in production systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Explain the difference between static and dynamic training paradigms. Provide examples of suitable use cases for each.

A

Static Training: Models are trained once using historical data and remain fixed post-deployment.

Use Case: Predicting physical constants or static phenomena, e.g., physics simulations.

Dynamic Training: Models are retrained periodically or continuously with new data.

Use Case: Spam detection, where patterns evolve rapidly over time.

Static is simpler and cost-effective but less adaptive, whereas dynamic handles evolving data at higher operational complexity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of Vertex AI’s managed Notebooks, and how do they enhance the ML workflow?

A

Vertex AI’s managed Notebooks offer:

1) Pre-installed Frameworks: TensorFlow, PyTorch, and scikit-learn for immediate experimentation.

2) Customizability: CPU/GPU configurations for specific workloads.

3) Security: Google Cloud authentication ensures safe data and code access.

4) Integration: Seamlessly connects with datasets, training pipelines, and models within Vertex AI.

These features accelerate prototyping and simplify deployment for ML engineers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the purpose of hyperparameter tuning in Vertex AI, and how does it function?

A

Hyperparameter tuning searches for the optimal configuration of hyperparameters to improve model performance. In Vertex AI:

The system evaluates combinations of hyperparameters across multiple trials.
Optimization algorithms (e.g., Bayesian optimization) guide the search process.
Results are logged, enabling engineers to identify the best-performing configuration.
This ensures models achieve maximum accuracy and efficiency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Describe the role of a model registry in production ML systems.

A

A model registry:

1) Tracks Versions: Stores different versions of models, including training metadata and hyperparameters.

2) Facilitates Governance: Logs who trained and deployed models and the datasets used.

3) Supports Audits: Enables traceability for compliance and debugging.

4) Simplifies Reuse: Provides a central repository for reusing validated models across teams.

Vertex AI Model Registry supports efficient management of ML artifacts in production.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Compare static and dynamic serving architectures, including their trade-offs.

A

Static Serving: Precomputes predictions and stores them in a database.

Pros: Low latency, reduced compute costs.
Cons: High storage requirements, lacks adaptability.
Use Case: Predicting product recommendations for static catalogs.

Dynamic Serving: Computes predictions on demand.

Pros: Scales with dynamic data, no storage overhead.
Cons: Higher latency, compute-intensive.
Use Case: Real-time fraud detection.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are hybrid serving architectures, and when are they appropriate?

A

Hybrid architectures combine static caching for frequently requested predictions with dynamic serving for the long tail. They are suitable when:

Data distributions are peaked, with many repetitive queries.
Systems require a balance between storage, latency, and compute efficiency. Example: A voice-to-text system caching common phrases while dynamically processing unique inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What errors does monitoring help detect and how does Vertex AI monitoring help maintain model performance in production?

A

Monitoring detects:

1) Model Drift: Changes in prediction accuracy over time.

2) Data Drift: Shifts in input data distribution.

3) Traffic Patterns: Abnormalities in requests or latency.

4) Resource Usage: Inefficient allocation of compute or storage resources.

Vertex AI Monitoring automatically triggers alerts and retraining when thresholds are breached, ensuring system reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Discuss the design considerations for building an ML pipeline for traffic prediction.

A

For a traffic prediction system:

1) Training Architecture: Use dynamic training to adapt to changing traffic patterns and events.

2) Serving Architecture: A hybrid model—cache predictions for busy roads and compute dynamically for less-trafficked areas.

3) Data Sources: Combine sensor data with historical patterns for robust predictions.

Design must address temporal dynamics and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the importance of timestamp alignment in training ML models?

A

Timestamp alignment ensures:

1) Temporal Consistency: Training data reflects the actual state at the time of observation.

2) Prevention of Data Leakage: Avoids incorporating future information into training.

3) Reproducibility: Enables point-in-time analysis.

Misalignment can lead to flawed models and reduced real-world accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are endpoints in Vertex AI, and what are their key features?

A

Endpoints are RESTful services that host trained models for online or batch predictions. Key features:

1) Multiple Models: Can deploy several models to a single endpoint for traffic splitting.

2) Deployment Flexibility: Allows testing new models alongside live systems.

3) Configuration: Managed via names, regions, and access levels.

Endpoints ensure efficient and scalable inference delivery.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When should AutoML be preferred over custom training?

A

AutoML is preferred when:

Speed and Simplicity: Rapid prototyping or minimal ML expertise is available.

Dataset Exploration: Evaluating features or suitability before custom development. Custom training is better for complex use cases requiring full control and optimization.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How do you transition a trained model to production using Vertex AI?

A

1) Model Validation: Ensure quality via evaluation metrics.

2) Registry Registration: Store metadata and lineage in the model registry.

3) Endpoint Deployment: Assign the model to an endpoint for serving.

4) Monitoring: Configure performance tracking and alerts.

This systematic approach guarantees reliability and scalability in production environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the four common dependencies in ML systems, and why are they prone to change?

A

1) Upstream Models: May be retrained or updated without notice, altering their output distributions.

2) External Data Sources: Often managed by other teams who may change schemas or formats.

3) Feature-Label Relationships: Can evolve over time as real-world dynamics change.

4) Input Distributions: Subject to shifts due to seasonality, policy changes, or user behaviour.

These dependencies change because they often rely on external factors or dynamic systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why is modular design important in machine learning systems, and how does it differ from monolithic approaches?

A

Modular design improves maintainability, testability, and reuse by isolating components such as data ingestion, preprocessing, and training.

Modular Systems: Allow engineers to focus on small, independent units.
Monolithic Systems: Are tightly coupled, making debugging and updates complex.
Containers, like Kubernetes, simplify modular designs by abstracting applications and libraries.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Describe a scenario where upstream model changes negatively impact an ML system. How can this be mitigated?

A

Scenario: An umbrella demand model depends on a weather model trained on incorrect historical data. Fixing the weather model causes the umbrella model to underperform due to unexpected input distribution changes.
Mitigation:

Implement notifications for upstream changes.
Maintain a local version of the upstream model to track updates.
Monitor input distributions for deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

How can indiscriminate feature inclusion degrade model performance?

A

Including features without understanding their relationships can lead to:

Correlated Features: Models may over-rely on non-causal features.
Decorrelation: When a correlated feature loses its relationship to the label, model accuracy drops.
Best Practices: Use leave-one-out evaluations to assess feature importance and include only causally significant features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is the difference between interpolation and extrapolation in ML predictions? Why is interpolation more reliable?

A

Interpolation: Predictions within the range of training data; more reliable as the model has seen similar data.

Extrapolation: Predictions outside the training data range; less accurate as the model generalizes beyond its training.

Example: A model trained on house prices in urban areas interpolates well in cities but extrapolates poorly for rural properties.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What techniques help mitigate the impact of changing data distributions?

A

Monitoring: Analyze input summaries (mean, variance) for deviations.

Residual Analysis: Track prediction errors across different input segments.

Temporal Weighting: Prioritize recent data using custom loss functions.

Retraining: Regularly update models with new data to adapt to distribution changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Explain the concept of data leakage and provide an example.

A

Definition: Data leakage occurs when information not available during inference influences model training, leading to inflated performance metrics.

Example: A hospital assignment model uses “hospital name” during training, which is unavailable during real-time predictions. This results in degraded performance when deployed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the types of drift in ML systems, and how do they affect models?

A

1) Data Drift: Change in input feature distributions (e.g., income levels rising).

2) Concept Drift: Shift in feature-label relationships (e.g., income thresholds for loans).

3) Prediction Drift: Change in output distributions, possibly due to business changes.

4) Label Drift: Shift in label distributions over time.

Each drift reduces model accuracy and necessitates monitoring and retraining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

How can concept drift manifest in e-commerce recommendation systems? and how do you mitigate against this?

A

In e-commerce:

Concept Drift: Customer preferences change over time due to trends or seasonality.

Impact: Static models recommend outdated products, reducing engagement.

Solution: Periodically retrain models on the latest user interactions and purchasing data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is the role of TensorFlow Data Validation (TFDV) in mitigating training-serving skew?

A

TFDV helps:

1) Detect distribution differences between training and serving data.

2) Identify anomalies (e.g., missing or out-of-range values).

3) Generate statistics and schemas for feature validation.

This ensures data consistency across training and production environments.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What are the main components / features of TensorFlow Data Validation, and what is their purpose?

A

StatisticsGen: Computes feature statistics for validation.

SchemaGen: Infers feature types, categories, and ranges.

ExampleValidator: Detects anomalies by comparing data against the schema.

These components ensure clean, consistent data for robust ML pipelines.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How does feedback loop degradation occur in ML systems? Provide an example.

A

Definition: Feedback loops occur when model predictions influence future data, potentially amplifying errors.

Example: A demand prediction model underpredicts inventory. Reduced orders reinforce low sales, causing further underpredictions.

Mitigation: Monitor performance metrics and manually intervene to correct feedback distortions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

What is the “cold start” problem in static recommendation models, and how can it be addressed?

A

Problem: Static models fail to account for new users, products, or behaviors.
Solution:

Dynamically retrain models with recent data.
Use hybrid approaches (e.g., content-based and collaborative filtering).
This keeps recommendations relevant and engaging.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

What are the key differences between sudden, gradual, incremental, and recurring concept drift?

A

Sudden Drift: Abrupt shifts in relationships (e.g., policy changes).

Gradual Drift: Slow transitions (e.g., user preference changes).

Incremental Drift: Step-by-step changes in relationships.

Recurring Drift: Periodic return to previous states (e.g., seasonal trends).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How can ML engineers diagnose and mitigate data drift in production?

A

Diagnosis:

Compare real-time data statistics to training data.
Use monitoring tools to track anomalies in feature distributions.

Mitigation:

Label new data for retraining.
Apply transfer learning or ensemble methods to adapt models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

What are some schema validation checks done in TFDV and how do they prevent data pipeline issues?

A

Schema validation checks:

Feature Types: Ensures consistent input formats.

Presence Requirements: Validates mandatory fields.

Range Constraints: Detects outliers in numeric data.

This prevents errors during training and serving.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What steps are involved in containerizing and deploying a custom ML model in Vertex AI?

A

Containerization: Package training code into a Docker container with dependencies.

Training: Submit the container to Vertex AI for cloud-based training.

Deployment: Use Vertex AI Endpoints to serve the trained model.

Testing: Validate predictions using the deployed endpoint.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How can ML pipelines be designed to adapt to dynamic data environments?

A

Use automated monitoring for data drift and anomalies.

Regularly retrain models on new data.

Design pipelines with modular components for flexibility.

Implement feedback loops for continuous learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Explain the difference between data shift and concept drift with examples.

A

Data Shift: Input distributions change, e.g., increased income levels in credit applications.

Concept Drift: Feature-label relationships evolve, e.g., stricter loan approval criteria for the same income.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Why is it critical to test ML models for data leakage, and how can it be avoided?

A

Importance: Data leakage inflates performance during training but degrades accuracy in production.

Prevention:

Exclude features unavailable during inference.
Validate data partitions to ensure no overlap between training and testing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

What are the three primary performance bottlenecks in ML training systems, and how do they impact performance?

A

1) Input/Output (IO): Data retrieval is too slow, often due to low throughput storage systems or large, complex input pipelines.

2) Compute (CPU/GPU): Heavy computational requirements overwhelm the processor, particularly with complex models.

3) Memory: Insufficient memory limits the ability to store weights or process large batches.

These bottlenecks affect training speed, model accuracy, and scalability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How can you mitigate IO-bound training performance issues?

A

Use a high-throughput storage system like Google Cloud Storage (GCS).

Optimize input pipelines with parallel reads and prefetching.

Reduce batch size to minimize data fetched per step.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

What are the key non technical considerations when designing ML systems?

A

1) Business Use Case: Deadlines (e.g., training models overnight for daily recommendations).

2) Budget Constraints: Balancing cost with infrastructure speed.

3) Dataset Size: Larger datasets improve accuracy but increase training time.

4) Scalability: Choosing between single, multi-machine, or distributed systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

What strategies can address CPU-bound training limitations?

A

1) Use faster accelerators like GPUs or TPUs.

2) Simplify models by reducing layers or using less computationally expensive activation functions.

3) Train for fewer steps while maintaining acceptable accuracy.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Explain memory-bound training issues and their solutions.

A

Issues: Training models with large datasets or high parameter counts may exceed memory limits.

Solutions:

Add more memory to workers.
Reduce batch sizes.
Optimize model architecture to use fewer layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the role of batch size in distributed training? How does it affect performance?

A

Batch size determines how much data is processed per step:

Larger Batch Sizes: Improve throughput but increase memory usage.

Smaller Batch Sizes: Fit memory constraints but may slow convergence.

In distributed systems, global batch size = number of replicas × per-replica batch size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How does data parallelism work in distributed ML training? (from a technical perspective)

A

1) Each worker processes a different portion of the dataset.

2) Gradients are computed locally and averaged (Allreduce) across workers.

3) Parameters are synchronized after each step, ensuring consistency.

Data parallelism is model-agnostic and scales effectively for large datasets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

What is the difference between synchronous and asynchronous distributed training?

A

Synchronous Training: Workers compute gradients in lockstep; gradients are averaged to update parameters.

Pro: Ensures consistency.
Con: Slower due to synchronization overhead.

Asynchronous Training: Workers independently compute and update parameters.

Pro: Faster and resilient to worker failures.
Con: May lead to stale updates and slower convergence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

What is model parallelism, and when is it used?

A

Model parallelism divides a model across multiple devices, with each processing different layers or components.

Use Case: Models too large to fit on a single device (e.g., large transformers).

Challenge: Synchronizing intermediate outputs between devices.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

How does TensorFlow’s MirroredStrategy enable distributed training?

A

MirroredStrategy replicates models across GPUs on a single machine.

Data Distribution: Global batch split among GPUs.

Gradient Updates: Gradients averaged across replicas (Allreduce).

Use Case: Single-machine multi-GPU setups.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What is the Multi-Worker MirroredStrategy, and how does it scale training?

A

Multi-Worker MirroredStrategy extends MirroredStrategy to multiple machines, each with GPUs.

Synchronization: Gradients are shared across workers.

Data Sharding: Workers process non-overlapping data.

Use Case: Large-scale synchronous distributed training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What are TPUs and what are their advantages/disadvantages in high-performance ML training.

A

TPUs (Tensor Processing Units) are custom accelerators optimized for matrix computations.

Advantages: Faster training, especially for deep learning.

Challenges: Requires optimized input pipelines due to TPU speed.

Strategy: Use TPUStrategy for seamless TensorFlow integration.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

What is the difference between batch and online predictions in inference systems?

A

Batch Predictions: Precomputed for large datasets; optimized for throughput.

Online Predictions: Real-time predictions for individual queries; optimized for latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

From a technical perspective, what infrastructure factors affect inference performance? (speed and cost)

A

Throughput: Queries per second (QPS).

Latency: Response time per query.

Cost: Infrastructure and maintenance expenses.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

How does Cloud Dataflow process datasets and integrate ML models for batch pipelines?

A

Cloud Dataflow processes datasets by:

Reading data from GCS or BigQuery.

Enriching data with model predictions (TensorFlow SavedModel or TensorFlow Serving).

Writing enriched data back to storage.

51
Q

How do SavedModel and TensorFlow Serving compare for inference?

A

SavedModel: Fastest for batch pipelines, reduces overhead.

TensorFlow Serving: Easier maintenance, supports real-time queries.

52
Q

Why is distributed training essential for large-scale ML?

A

Scale: Handles large datasets and complex models.

Speed: Reduces training time by leveraging multiple devices.

Flexibility: Adapts to diverse workloads with parallelism strategies.

53
Q

How does the tf.data API handle large datasets?

A

The tf.data API creates scalable input pipelines for training:

Supports sharded datasets for large files.

Handles transformations like normalization and batching.

Optimizes IO performance with prefetching.

54
Q

What is minibatching, and how does it improve performance?

A

Minibatching groups multiple data points for simultaneous processing.

Advantages: Improves computational efficiency and reduces parameter update overhead.

Challenge: Larger batches require more memory.

55
Q

What is the function of parameter servers in distributed training? and what are their advantages and disadvantages?

A

Parameter servers store model weights and handle updates during asynchronous training.

Advantages: Scales well for sparse models.

Challenges: Can create network bottlenecks for dense models.

56
Q

How would you categorise the three key requirements for building hybrid machine learning systems?

A

Composability, portability, and scalability.

Composability: Ability to combine microservices and choose components that make sense for the problem

Portability: Capability to move machine learning workflows across different environments (laptop, on-premises, cloud)

Scalability: Ability to scale across accelerators (GPUs, TPUs), storage, skillsets, teams, and experiments

57
Q

What is Kubeflow, and what makes it unique for machine learning workflows?

A

Kubeflow is an open-source machine learning platform built on Kubernetes that:

Enables machine learning pipeline orchestration

Allows deployment of ML workflows across different environments (phone, laptop, on-premises cluster, cloud)

Provides consistent code execution with minimal configuration changes

Extends Kubernetes’ capabilities with ML-specific frameworks and libraries

58
Q

Why might an organization need a hybrid cloud machine learning approach instead of using a single cloud provider?

A

Potential scenarios include:

Being tied to on-premises infrastructure

Data privacy or regulatory constraints preventing full cloud migration

Multi-cloud data production or consumption requirements

Edge computing needs (IoT devices, local inference)

Gradual cloud migration strategy

Avoiding vendor lock-in

59
Q

Explain the concept of edge machine learning and its significance.

A

Edge machine learning involves:

Performing model inference directly on local devices

Reducing network latency and bandwidth consumption

Enabling machine learning in environments with poor connectivity

Supporting privacy-preserving techniques like federated learning

Extracting meaningful insights from sensor data without constant cloud communication

60
Q

What are the key considerations for optimizing TensorFlow models for mobile devices?

A

Mobile TensorFlow optimization involves:

Reducing code footprint

Supporting quantization and lower-precision arithmetic

Embedding models directly on devices

Using thin wrappers for native implementation

Performing inference on worker threads to avoid blocking main thread

Potentially sacrificing model accuracy for performance

61
Q

What is federated learning, and how does it enhance mobile machine learning?

A

Federated learning is an approach where:

Model updates are aggregated from multiple devices

Models are continuously trained on individual user devices

Allows collective model improvement without centralized data collection

Individual user experiences are personalized

Privacy is maintained by only sharing model updates, not raw data

62
Q

What are the typical challenges in moving machine learning workflows between environments?

A

Challenges include:

Reconfiguring entire technology stack for each new environment

Replicating library dependencies

Recreating testing environments

Managing different infrastructure requirements

Ensuring consistent model performance across varied computational resources

63
Q

How does Kubernetes support hybrid cloud machine learning architectures?

A

Kubernetes supports hybrid cloud ML by:

Enabling container orchestration across different environments

Providing consistent deployment mechanisms

Allowing seamless migration between on-premises and cloud infrastructure

Supporting scalable and portable machine learning workflows

Reducing infrastructure management overhead

64
Q

What are the trade-offs of using TensorFlow Lite for mobile machine learning?

A

Trade-offs include:

Reduced model complexity and accuracy

Limited model maintainability

Inability to resume training from optimized model graphs

Potential performance improvements

Smaller model size and reduced computational requirements

65
Q

Describe the process of performing image recognition in a hybrid mobile ML scenario.

A

Hybrid image recognition typically involves:

Performing initial feature extraction locally

Running neural network on mobile device to extract object labels

Sending processed, reduced-complexity data to cloud

Reducing network bandwidth consumption

Enabling faster response times

66
Q

What makes Kubeflow particularly valuable for machine learning infrastructure?

A

Kubeflow’s value stems from:

Providing open-source, flexible ML pipeline management

Supporting multi-environment deployment

Reducing infrastructure lock-in

Enabling consistent workflow across different computational resources

Simplifying complex ML workflow orchestration

67
Q

Outline some use cases for machine learning in mobile-specific data analytics?

A

Mobile ML data analytics can:

Detect patterns in motion sensor data

Analyze GPS tracking information

Extract meaningful feature vectors from raw sensor data

Perform local preprocessing before cloud transmission

Enable intelligent, context-aware mobile applications

68
Q

What are the primary motivations for deploying machine learning models on edge devices?

A

Motivations include:

Reducing network latency

Minimizing bandwidth consumption

Enabling offline functionality

Supporting privacy-preserving computation

Personalizing user experiences

Operating in low-connectivity environments

69
Q

Explain some of the non technical architectural considerations that go into designing a hybrid ML system.

A

Architectural considerations involve:

Managing diverse team skillsets

Coordinating across research, engineering, and monitoring teams

Balancing computational resources

Ensuring consistent model performance

Supporting flexible, scalable infrastructure

Maintaining interoperability between environments

70
Q

What strategies can be employed to optimize models e.g (TensorFlow) for mobile deployment?

A

Optimization strategies include:

Quantizing neural network nodes

Converting variable nodes to constants

Using smaller, less complex model architectures

Implementing efficient inference libraries

Leveraging platform-specific optimization tools (Bazel, CocoaPods)

Minimizing model size and computational requirements

71
Q

Discuss the role of microservices in mobile machine learning architectures.

A

In mobile ML architectures:

Microservices are often impractical due to added latency

Direct library integration is preferred over process delegation

Emphasis on lean, embedded model execution

Focus on efficient, localized computational approaches

72
Q

How do hybrid ML systems address the challenge of model training and inference across different environments?

A

Hybrid ML systems address this by:

Providing consistent workflow across environments

Enabling flexible model training locations

Supporting distributed model development

Allowing seamless transition between training and inference platforms

Maintaining model portability and reproducibility

73
Q

What are the potential privacy and security implications of edge and hybrid machine learning?

A

Implications include:

Localized data processing reducing central data exposure

Federated learning minimizing raw data transmission

Enabling compliance with strict data protection regulations

Providing granular control over data movement

Reducing centralized data collection risks

74
Q

Explain the fundamental differences between traditional language models and large language models (LLMs) in terms of their capabilities and architecture.

A

Traditional language models were primarily focused on predicting single words or short sequences based on immediate context, while LLMs represent a significant evolution in capability and scale. Key differences include:

Scale: LLMs contain billions of parameters (from BERT’s 110M to PaLM 2’s 340B+) compared to traditional models

Sequence Length: Modern LLMs can process and predict entire documents, not just individual words

Architecture: LLMs typically use Transformer architecture with self-attention mechanisms, allowing them to capture long-range dependencies

Emergent Abilities: LLMs demonstrate capabilities beyond their training objectives, such as reasoning, code generation, and mathematical problem-solving

Resource Requirements: LLMs require significant computational resources and specialized infrastructure for training and deployment

75
Q

How does self-attention in Transformer models work, and why is it crucial for modern LLMs?

A

Self-attention is a fundamental mechanism in Transformer architecture that enables tokens to dynamically focus on relevant parts of the input sequence. The mechanism allows for parallel processing of sequences and the process works by:

Each token computes attention scores with every other token in the sequence

Attention scores determine how much each token should “pay attention to” other tokens

which enables the model to capture both local and long-range dependencies

For example, in the sentence “The animal didn’t cross the street because it was too tired”:

The pronoun “it” needs to determine which noun it refers to

Self-attention helps the model understand “it” refers to “animal” rather than “street”

This is achieved by computing attention weights between “it” and all other tokens

The highest weights will be assigned to the most relevant context words

76
Q

Describe the LoRA (Low-Rank Adaptation) technique and its advantages in fine-tuning LLMs.

A

LoRA is a parameter-efficient fine-tuning technique that optimizes model adaptation while minimizing computational overhead. Key aspects include:

Core Mechanism:

Freezes pretrained model weights
Injects trainable low-rank matrices into each Transformer layer
Exploits the rank-deficiency of weight changes during adaptation

Technical Implementation:

Introduces matrices A and B as low-rank decomposition
Updates only these smaller matrices during fine-tuning
Maintains model quality while reducing parameter count

Advantages:

Significantly reduced memory footprint
No additional inference latency
Enables efficient task-switching
Allows sharing of pretrained models across multiple tasks
Reduces storage requirements for fine-tuned models

77
Q

What is Vertex AI Reasoning Engine, and how does it integrate with LangChain for building generative AI applications?

A

Vertex AI Reasoning Engine is a managed runtime service that enables deployment of LangChain-based applications on Google Cloud. Key components and features include:

System Components:

LLM integration (e.g., Gemini models)
Tool/Function calling capabilities
Orchestration framework using LangChain
Managed runtime environment

Integration Benefits:

Simplified deployment process
Built-in security and privacy controls
Automatic scaling
Integration with Google Cloud services
Support for various frameworks (LangChain, OneTwo, LangGraph)

Deployment Flow:

Development of LangChain application
Configuration of tools and external APIs
Deployment to managed runtime
Monitoring and management through Vertex AI

78
Q

Explain the concept of in-context learning in LLMs and its theoretical foundations based on recent research.

A

In-context learning is a phenomenon where LLMs can learn new tasks from just a few examples without parameter updates. Recent research from MIT, Google Research, and Stanford reveals:

Mechanism:

Large models contain implicit smaller, linear models within their hidden states
The larger model implements learning algorithms to train these internal models
No parameter updates required in the main model

Technical Implementation:

Occurs in early layers of the transformer
Utilizes hidden states to store task-specific information
Implements simple learning algorithms internally

Implications:

Enables few-shot learning capabilities
Reduces need for task-specific fine-tuning
Shows models are more sophisticated than simple pattern matching
Opens new possibilities for efficient model adaptation

79
Q

What are the key considerations and best practices for prompt engineering when working with LLMs?

A

Effective prompt engineering requires understanding several key principles and techniques:

Structural Elements:

Clear role definition
Contextual information
Specific instructions
Output format specification

Advanced Techniques:

Zero-shot prompting for simple tasks
Few-shot prompting for complex patterns
Chain-of-thought prompting for reasoning tasks
Role-based prompting for specialized behaviors

Optimization Strategies:

Iterate and refine prompts
Use specific keywords and constraints
Break complex tasks into smaller steps
Implement self-evaluation mechanisms
Leverage example libraries and templates

80
Q

Explain QLoRA (Quantized Low-Rank Adaptation) and how it improves upon standard LoRA.

A

QLoRA enhances LoRA by introducing quantization techniques to further reduce memory requirements while maintaining performance:

Technical Components:

4-bit NormalFloat (NF4) quantization
Double Quantization for constants
Low-rank adaptation matrices
Parameter-efficient fine-tuning

Key Improvements:

Reduced memory footprint through quantization
Maintained model quality
Enhanced efficiency for resource-constrained environments
Broader applicability across model architectures

Implementation Benefits:

Enables fine-tuning on consumer GPUs
Reduces storage requirements
Maintains performance parity with full fine-tuning
Supports various model architectures (RoBERTa, DeBERTa, GPT-2/3)

81
Q

What are the four main components of building and deploying a custom generative AI application using Vertex AI, and how do they interact?

A

The four main components are:

LLM Component:

Processes queries and generates responses
Integrates with function calling
Handles model versioning and lifecycle

Tool Component:

Communicates with external APIs
Implements Gemini Function Calling
Supports LangChain Tool/Function Calling
Handles database and service integrations

Orchestration Framework:

Manages application flow
Implements LangChain templates
Controls deterministic behavior
Structures system components

Managed Runtime:

Handles deployment and scaling
Provides security and monitoring
Manages API endpoints
Ensures system reliability

These components interact in a workflow where:

User queries are processed by the LLM
Tools are called as needed for external data
Orchestration framework manages the flow
Runtime environment handles operational aspects

82
Q

Describe the key challenges and considerations when implementing Parameter-Efficient Fine-Tuning (PEFT) methods.

A

PEFT implementation requires careful consideration of several factors:

Technical Challenges:

Balancing performance vs. efficiency
Maintaining model quality
Managing training time
Optimizing hyperparameters

Implementation Considerations:

Choice of PEFT method (LoRA, QLoRA, AdaMix, etc.)
Resource constraints
Task requirements
Model architecture compatibility

Trade-offs:

Memory usage vs. computational cost
Training time vs. parameter efficiency
Performance vs. resource usage
Flexibility vs. complexity

83
Q

How does the system flow work in a Vertex AI Reasoning Engine deployment, and what are the key stages of interaction?

A

The system flow in Vertex AI Reasoning Engine follows a specific sequence:

Query Processing:

User submits query
Agent formats prompt for LLM
LLM processes initial prompt

Tool Integration:

LLM determines tool necessity
Generates FunctionCall if needed
Tool executes and returns results

Response Generation:

LLM processes tool results
Generates final content
Agent formats response

Flow Control:

Handles multiple tool calls if needed
Manages conversation context
Ensures response quality
Maintains system stability

84
Q

What are the emergent abilities of LLMs, and how do they differ from trained capabilities?

A

Emergent abilities are capabilities that appear in larger language models without explicit training:

Types of Emergent Abilities:

Mathematical reasoning
Code generation
Logical deduction
Multi-step problem solving
Task decomposition

Characteristics:

Appear above certain model size thresholds
Not explicitly trained for
Often improve with scale
Demonstrate complex reasoning

Applications:

Zero-shot task handling
Complex problem solving
Creative generation
Analytical tasks

85
Q

What are the key components of successful LLM deployment on Vertex AI, and how should they be managed?

A

Successful LLM deployment requires attention to several critical areas:

Infrastructure Components:

Model selection and versioning
Resource allocation
Scaling configuration
Monitoring setup

Operational Considerations:

Security and access control
Performance monitoring
Cost optimization
Error handling

Management Aspects:

Version control
Deployment strategies
Update procedures
Backup and recovery
Performance optimization

86
Q

Describe the concept of self-attention in Transformer architecture and its impact on model performance.

A

Self-attention is a core mechanism that enables contextual understanding:

Technical Implementation:

Computes attention scores between all tokens
Uses Query, Key, and Value matrices
Implements parallel processing
Enables global context awareness

Performance Impact:

Improves long-range dependency capture
Enhances context understanding
Enables better feature extraction
Supports parallel processing

Architectural Benefits:

No fixed window size limitations
Dynamic context weighting
Position-aware processing
Flexible feature capturing

87
Q

What are the key considerations for prompt engineering when working with Vertex AI models?

A

Effective prompt engineering for Vertex AI requires understanding several aspects:

Structure:

Clear task definition
Contextual information
Specific instructions
Output format specification

Best Practices:

Use consistent formatting
Provide relevant examples
Include constraints
Implement validation

Optimization:

Iterate on prompts
Test different approaches
Monitor performance
Adjust based on feedback

88
Q

Explain the concept of parameter efficiency in LLM fine-tuning and its importance.

A

Parameter efficiency in fine-tuning focuses on optimizing model adaptation:

Core Concepts:

Minimize trainable parameters
Maintain model quality
Reduce resource requirements
Enable efficient deployment

Implementation Methods:

Low-rank adaptations
Quantization techniques
Selective fine-tuning
Efficient architecture modifications

Benefits:

Reduced memory usage
Lower computational costs
Faster training time
Improved deployment flexibility

89
Q

What fundamental distinction exists between generative and discriminative models, and how do they differ in their probabilistic approaches?

A

Generative and discriminative models differ in their fundamental mathematical approaches:

Generative models:

Capture the joint probability distribution p(X, Y) or p(X) for unlabeled data
Can generate new data instances that resemble the training distribution
Model the actual distribution of each class in the feature space
Learn the intrinsic patterns and structure of the input data
Example applications: GANs, language models, image synthesis

Discriminative models:

Capture the conditional probability p(Y|X)
Focus on learning boundaries between classes
Don’t model the underlying data distribution
More efficient for classification tasks
Example applications: Random Forests, SVMs, standard Neural Networks for classification

Key distinction: Generative models must learn the full data distribution, making them more complex but more versatile, while discriminative models only need to learn decision boundaries, making them more efficient for specific tasks.

90
Q

Explain how modern language models work, particularly focusing on their training methodology and core mechanisms.

A

Modern language models operate through several key mechanisms:
Core Training Approach:

Based on next-token prediction in a sequence
Trained on massive text corpora (often 45+ terabytes of text data)
Utilize self-supervised learning rather than traditional supervised approaches

Key Technical Components:

Transformer Architecture:

Self-attention mechanisms
Parallel processing capability
Direct modeling of long-range dependencies

Training Process:

Pre-training on broad internet-scale data
Fine-tuning for specific tasks
Token-based prediction and generation

Context Understanding:

Builds probabilistic understanding of word relationships
Captures semantic and syntactic patterns
Maintains context across long sequences

Performance Characteristics:

Can generate coherent, contextually appropriate text
Handles various tasks (completion, translation, summarization)
Improves with scale (both data and model size)

91
Q

What is temperature in NLP models, and how does it affect model outputs? Include the mathematical formulation.

A

Temperature (θ) is a hyperparameter that controls the randomness in the output distribution of language models:
Mathematical Definition:

Standard softmax: σ(zi) = exp(zi) / Σ(exp(zj))
Temperature-adjusted softmax: σ(zi) = exp(zi/θ) / Σ(exp(zj/θ))

Effects:

Lower temperature (θ < 1):

Makes distribution more peaked
Increases confidence in high-probability tokens
More deterministic outputs
Better for factual responses or specific tasks

Higher temperature (θ > 1):

Flattens the distribution
Increases diversity in outputs
More creative/random responses
Better for creative writing or exploration

Use cases:

Low temperature: Question answering, factual generation
High temperature: Creative writing, brainstorming
θ = 1.0: Standard softmax behavior

92
Q

Describe the Transformer architecture’s key innovations and advantages over RNNs for language processing tasks.

A

The Transformer architecture introduced several revolutionary concepts:
Key Innovations:

Self-Attention Mechanism:

Direct modeling of word relationships regardless of position
Parallel computation of attention scores
Multi-head attention for different relationship types

Positional Encoding:

Maintains sequence order without recurrence
Allows parallel processing of entire sequences

Advantages over RNNs:

Computational Efficiency:

Parallel processing vs. sequential processing
Better utilization of modern hardware (GPUs/TPUs)
Constant time complexity for long-range dependencies

Learning Capability:

Better capture of long-range dependencies
No vanishing gradient problems
More stable training

Performance:

Superior results on translation tasks
Better scalability with model size
More efficient training

93
Q

What are the main challenges and limitations of generative AI models, and how can they be mitigated in production environments?

A

Key challenges and mitigation strategies:
Challenges:

Resource Requirements:

Massive computational needs
Large training datasets
Significant storage requirements

Quality Issues:

Hallucination and factual inaccuracies
Biased outputs
Inconsistent performance

Ethical Concerns:

Privacy implications
Potential misuse
Copyright issues

Mitigation Strategies:

Technical Solutions:

Implement robust monitoring systems
Use smaller, specialized models when possible
Apply fine-tuning for specific use cases
Implement content filtering and safety measures

Operational Controls:

Human-in-the-loop validation
Clear usage guidelines
Regular model evaluation and updating
Audit trails for model decisions

Risk Management:

Regular bias assessment
Legal compliance checks
Clear documentation of limitations
Incident response procedures

94
Q

How does the training process differ between discriminative and generative models in terms of computational requirements and complexity?

A

The training process differences are significant:
Discriminative Models:

Computational Requirements:

Generally lower computational needs
Faster training times
More efficient optimization
Focused on decision boundaries

Data Requirements:

Requires labeled data
Can work with smaller datasets
More efficient data utilization

Generative Models:

Computational Requirements:

Significantly higher computational needs
Longer training times
Complex optimization processes
Must model entire data distribution

Data Requirements:

Can use unlabeled data
Requires larger datasets
More sensitive to data quality
Needs diverse training examples

95
Q

Explain the concept of attention in neural networks and how it revolutionized NLP tasks.

A

Attention mechanisms transformed NLP by introducing:
Core Concepts:

Direct Relationships:

Models relationships between all tokens directly
Eliminates need for sequential processing
Enables parallel computation

Attention Computation:

Query, Key, Value paradigm
Soft alignment between elements
Weighted sum of values based on attention scores

Applications:

Translation:

Direct word alignment
Context-aware translation
Better handling of idioms

General NLP:

Document understanding
Question answering
Summarization

Advantages:

Better long-range dependency modeling
Interpretable attention weights
Scalable to large sequences

96
Q

What are the key considerations when deploying generative AI models in production environments?

A

Production deployment requires careful consideration of:
Technical Considerations:

Infrastructure:

Scaling requirements
Latency management
Resource optimization
Monitoring systems

Model Serving:

API design
Batch vs. real-time inference
Version control
A/B testing capability

Operational Considerations:

Quality Control:

Output validation
Performance monitoring
Error handling
Feedback loops

Safety Measures:

Content filtering
Rate limiting
User authentication
Audit logging

Business Considerations:

Cost Management:

Compute optimization
Resource allocation
ROI monitoring
Scaling strategies

Compliance:

Data privacy
Regulatory requirements
Model documentation
Usage policies

97
Q

How do language models handle context and disambiguation in text processing?

A

Language models employ several mechanisms for context handling:
Context Processing:

Attention Mechanisms:

Multi-head attention for different aspects
Self-attention for internal context
Cross-attention for external context

Token Representation:

Contextual embeddings
Position-aware processing
Subword tokenization

Disambiguation Strategies:

Statistical Learning:

Probability distribution over meanings
Context-dependent representation
Co-occurrence patterns

Architectural Features:

Bidirectional context
Layer-wise processing
Residual connections

Example Case:

Word “bank” disambiguation:

Context window analysis
Attention to relevant tokens
Probability distribution over meanings

98
Q

What are the main differences between traditional machine learning and modern generative AI approaches?

A

Key distinctions include:
Model Capabilities:

Traditional ML:

Focused on specific tasks
Rule-based or statistical
Limited generalization
Task-specific training

Generative AI:

Multi-task capability
Neural architecture based
Better generalization
Transfer learning enabled

Data Requirements:

Traditional ML:

Structured data focus
Smaller datasets
Task-specific data
Clear labels needed

Generative AI:

Handles unstructured data
Massive datasets
General knowledge learning
Self-supervised learning

Applications:

Traditional ML:

Classification
Regression
Clustering
Specific predictions

Generative AI:

Text generation
Image creation
Code synthesis
Creative tasks

99
Q

How do large language models handle and maintain consistency in long-form text generation?

A

Large language models maintain consistency through:
Technical Mechanisms:

Attention Span:

Context window management
Token position awareness
Memory mechanisms
Attention patterns

Coherence Strategies:

Topic tracking
Entity recognition
Narrative flow maintenance
Logical progression

Implementation Aspects:

Architecture Features:

Long-range dependencies
Cross-attention mechanisms
State maintenance
Context compression

Training Approaches:

Document-level training
Coherence objectives
Style consistency
Structure learning

100
Q

Explain the concept of self-supervised learning in the context of language models.

A

Self-supervised learning in language models involves:
Core Principles:

Training Approach:

No explicit labels needed
Uses internal structure of data
Creates own supervisory signals
Learns patterns automatically

Implementation:

Masked language modeling
Next token prediction
Sequence reconstruction
Contrastive learning

Advantages:

Data Efficiency:

Uses unlimited text data
No manual labeling
Natural language structure
Rich context learning

Model Capabilities:

General language understanding
Transfer learning potential
Robust representations
Flexible task adaptation

101
Q

What role does model size play in generative AI performance, and what are the associated trade-offs?

A

Model size impacts performance through:
Scale Effects:

Advantages:

Better pattern recognition
Improved generalization
More robust representations
Enhanced task performance

Challenges:

Increased compute needs
Higher memory requirements
Longer training times
Greater deployment costs

Trade-offs:

Technical:

Performance vs. efficiency
Accuracy vs. speed
Complexity vs. maintainability
Flexibility vs. specialization

Practical:

Cost vs. benefit
Latency vs. capability
Resource use vs. performance
Deployment options

102
Q

How do modern language models handle out-of-vocabulary words and rare tokens?

A

Modern language models address vocabulary challenges through:
Token Processing:

Subword Tokenization:

Byte-Pair Encoding (BPE)
WordPiece
SentencePiece
Character-level fallback

Handling Mechanisms:

Compositional representation
Context-aware processing
Unknown token handling
Rare word treatment

Implementation Strategies:

Technical Approaches:

Dynamic vocabulary
Hierarchical encoding
Attention mechanisms
Token merging

Performance Optimization:

Vocabulary size balance
Frequency-based decisions
Efficiency considerations
Coverage optimization

103
Q

What are the key differences between fine-tuning and few-shot learning in language models?

A

Fine-tuning and few-shot learning differ in:
Fine-tuning:

Process:

Updates model weights
Requires training data
Gradient-based learning
Task-specific adaptation

Characteristics:

Permanent changes
Better performance
Resource intensive
Task specialization

Few-shot Learning:

Process:

Uses examples in prompt
No weight updates
Pattern matching
In-context learning

Characteristics:

No permanent changes
More flexible
Less resource intensive
General capability

104
Q

What is MLOps, and how does it evolve to meet the requirements of generative AI?

A

MLOps is a set of practices, processes, and tools to operationalize machine learning systems effectively. For generative AI:

Traditional MLOps Principles: Standardize workflows for predictive AI tasks like regression and classification.

Generative AI Adaptations:
Pre-trained model discovery instead of building models from scratch.
Introduction of customization and fine-tuning phases.
Focus on unstructured outputs and unique metrics (e.g., fluency, factuality).

This evolution ensures that MLOps accommodates generative AI’s complexity and potential.

105
Q

Compare predictive AI and generative AI in terms of their core objectives and applications.

A

Predictive AI: Makes decisions or predictions using pre-existing data (e.g., classification, regression).

Applications: Fraud detection, demand forecasting.

Generative AI: Creates new content by learning patterns in data (e.g., text, images).

Applications: Text summarization, image generation, chatbots.

Generative AI extends the scope of AI by producing novel outputs, demanding additional infrastructure and operational considerations.

106
Q

How do the training and serving workflows differ between traditional ML and generative AI systems?

A

Traditional ML:
Training: Labeled data + model training.
Serving: Deploy model + inference pipeline.

Generative AI:
Training: Pre-trained models + customization (fine-tuning).
Serving: Generating outputs with prompts and embeddings.

Generative AI integrates phases like data curation and embedding management to handle its complexity.

107
Q

What is the role of curated data in generative AI, and how does it differ from traditional ML datasets?

A

Curated Data: Domain-specific, high-quality datasets tailored for fine-tuning generative models.

Traditional Datasets: Typically large, labeled datasets for training from scratch. Curated data ensures generative models align with specific tasks, enhancing relevance and performance.

108
Q

What are the new artifacts introduced in generative AI, and how should they be governed?

A

New artifacts include:

Prompts: Instructions for guiding outputs.
Embeddings: Dense vector representations for unstructured data.
Adaptive Layers: Fine-tuned model components.
Governance involves managing versions, tracking lineage, and integrating tools like Vertex AI Model Registry and Feature Store.

109
Q

How does Vertex AI assist with the discovery and experimentation phase for generative AI?

A

Vertex AI provides:

Model Garden: Access to pre-trained models from Google, open-source, and third-party providers.

Generative Studio: A user-friendly interface for fine-tuning models and testing prompts. These tools simplify the exploration of models, reducing data collection and preparation time.

110
Q

Explain how fine-tuning generative AI models differs from training traditional ML models.

A

Fine-tuning involves:

Adapting pre-trained models to domain-specific tasks.

Techniques like:
Supervised Tuning: Uses labeled data.
Reinforcement Learning with Human Feedback (RLHF): For tasks with subjective outputs (e.g., summarization).

Optimizing additional artifacts like embeddings and prompts.

Traditional training typically builds models from scratch, focusing on core algorithms and datasets.

111
Q

Describe prompt engineering and its significance in generative AI workflows.

A

Prompt engineering is crafting instructions for language models to produce desired outputs. Its significance:

Enhances model accuracy without retraining.
Simplifies application development for diverse tasks. Tools like LangChain and Vertex AI assist in designing, testing, and refining prompts.

112
Q

What challenges do embeddings address in generative AI, and how are they managed?

A

Challenges:

Handling unstructured data (text, images, video).
Enabling applications like search, recommendations, and similarity matching.

Management: Vertex AI Feature Store and Vector Search store, retrieve, and serve embeddings efficiently.

113
Q

How does adaptive tuning improve generative AI models, and what tools support this?

A

Adaptive tuning updates only specific weights in a model, minimizing resource requirements.

Tools:

Vertex Model Registry: Tracks versions of adaptive layers.

Vertex AI Pipelines: Automates tuning workflows for reproducibility and lineage.

114
Q

What metrics are used to evaluate generative AI models, and how do they differ from traditional ML metrics?

A

Generative AI metrics:

Fluency: Naturalness of text outputs.
Factuality: Adherence to facts.
Brand Reputation: Alignment with brand guidelines.

Traditional metrics like accuracy and precision may not fully capture generative model performance.

115
Q

What role do safety scores and recitation checking play in monitoring generative AI systems?

A

Safety Scores: Assess risks across categories (e.g., bias, toxicity).
Recitation Checking: Detects unoriginal content by comparing outputs with existing data.
These measures ensure quality and trustworthiness of generated outputs.

116
Q

How does Vertex AI Evaluation Services facilitate generative AI monitoring?

A

Creates evaluation datasets of prompts and expected responses.

Computes metrics for fluency, factuality, and relevance.

Provides tools for monitoring safety scores and content authenticity.

117
Q

What infrastructure challenges arise with generative AI models, and how can Vertex AI address them?

A

Challenges:

High computational requirements.
Complex distributed training.
Vertex AI Solutions:

Provides GPU/TPU support.
Simplifies distributed training with pre-configured pipelines.

118
Q

How can RLHF be applied to generative AI tasks, and what advantages does it offer?

A

RLHF involves:

Human feedback to fine-tune outputs.
Applicable for tasks like summarization and chatbot responses.
Advantages:

Aligns model outputs with user expectations.
Improves handling of ambiguous or subjective tasks.

119
Q

What is the importance of grounding capabilities in generative AI workflows?

A

Grounding capabilities align model outputs with external data sources, reducing hallucinations.

Tools like Vertex PaLM ensure outputs reflect real-world context, improving reliability.

120
Q

How does Vertex AI enable scalable generative AI workflows with embeddings?

A

Vertex AI offers:

Embedding APIs: Generates vector representations for semantic analysis.
Vector Search: Facilitates efficient querying and retrieval.

121
Q

What is the role of Vertex Extensions in generative AI integrations?

A

Vertex Extensions:

Enable real-time connections to enterprise systems.
Extend generative AI workflows to real-world data and actions.

122
Q

How do you integrate enterprise data with generative AI models using Vertex AI?

A

Use embeddings for semantic comparisons.

Leverage grounding capabilities to align outputs with enterprise data.

Employ Vertex AI’s managed tools for real-time integration.

123
Q

Summarize the key adaptations needed to integrate generative AI into traditional MLOps.

A

Incorporate phases like pre-trained model discovery and tuning.
Manage artifacts like prompts and embeddings.
Evaluate using fluency, factuality, and reputation metrics.
Address safety and recitation with advanced monitoring tools.