Scaling prototypes into ML models Flashcards

Question

What is the role of TensorFlow Data Validation (TFDV) in mitigating training-serving skew?

Answer 1

TFDV helps: 1) Detect distribution differences between training and serving data. 2) Identify anomalies (e.g., missing or out-of-range values). 3) Generate statistics and schemas for feature validation. This ensures data consistency across training and production environments.

Answer 2

StatisticsGen: Computes feature statistics for validation. SchemaGen: Infers feature types, categories, and ranges. ExampleValidator: Detects anomalies by comparing data against the schema. These components ensure clean, consistent data for robust ML pipelines.

Answer 3

Definition: Feedback loops occur when model predictions influence future data, potentially amplifying errors. Example: A demand prediction model underpredicts inventory. Reduced orders reinforce low sales, causing further underpredictions. Mitigation: Monitor performance metrics and manually intervene to correct feedback distortions.

Answer 4

Problem: Static models fail to account for new users, products, or behaviors. Solution: Dynamically retrain models with recent data. Use hybrid approaches (e.g., content-based and collaborative filtering). This keeps recommendations relevant and engaging.

Answer 5

Sudden Drift: Abrupt shifts in relationships (e.g., policy changes). Gradual Drift: Slow transitions (e.g., user preference changes). Incremental Drift: Step-by-step changes in relationships. Recurring Drift: Periodic return to previous states (e.g., seasonal trends).

Answer 6

Diagnosis: Compare real-time data statistics to training data. Use monitoring tools to track anomalies in feature distributions. Mitigation: Label new data for retraining. Apply transfer learning or ensemble methods to adapt models.

Answer 7

Schema validation checks: Feature Types: Ensures consistent input formats. Presence Requirements: Validates mandatory fields. Range Constraints: Detects outliers in numeric data. This prevents errors during training and serving.

Answer 8

Containerization: Package training code into a Docker container with dependencies. Training: Submit the container to Vertex AI for cloud-based training. Deployment: Use Vertex AI Endpoints to serve the trained model. Testing: Validate predictions using the deployed endpoint.

Answer 9

Use automated monitoring for data drift and anomalies. Regularly retrain models on new data. Design pipelines with modular components for flexibility. Implement feedback loops for continuous learning.

Answer 10

Data Shift: Input distributions change, e.g., increased income levels in credit applications. Concept Drift: Feature-label relationships evolve, e.g., stricter loan approval criteria for the same income.

Answer 11

Importance: Data leakage inflates performance during training but degrades accuracy in production. Prevention: Exclude features unavailable during inference. Validate data partitions to ensure no overlap between training and testing.

Answer 12

1) Input/Output (IO): Data retrieval is too slow, often due to low throughput storage systems or large, complex input pipelines. 2) Compute (CPU/GPU): Heavy computational requirements overwhelm the processor, particularly with complex models. 3) Memory: Insufficient memory limits the ability to store weights or process large batches. These bottlenecks affect training speed, model accuracy, and scalability.

Answer 13

Use a high-throughput storage system like Google Cloud Storage (GCS). Optimize input pipelines with parallel reads and prefetching. Reduce batch size to minimize data fetched per step.

Answer 14

1) Business Use Case: Deadlines (e.g., training models overnight for daily recommendations). 2) Budget Constraints: Balancing cost with infrastructure speed. 3) Dataset Size: Larger datasets improve accuracy but increase training time. 4) Scalability: Choosing between single, multi-machine, or distributed systems.

Answer 15

1) Use faster accelerators like GPUs or TPUs. 2) Simplify models by reducing layers or using less computationally expensive activation functions. 3) Train for fewer steps while maintaining acceptable accuracy.

Answer 16

Issues: Training models with large datasets or high parameter counts may exceed memory limits. Solutions: Add more memory to workers. Reduce batch sizes. Optimize model architecture to use fewer layers.

Answer 17

Batch size determines how much data is processed per step: Larger Batch Sizes: Improve throughput but increase memory usage. Smaller Batch Sizes: Fit memory constraints but may slow convergence. In distributed systems, global batch size = number of replicas × per-replica batch size.

Answer 18

1) Each worker processes a different portion of the dataset. 2) Gradients are computed locally and averaged (Allreduce) across workers. 3) Parameters are synchronized after each step, ensuring consistency. Data parallelism is model-agnostic and scales effectively for large datasets.

Answer 19

Synchronous Training: Workers compute gradients in lockstep; gradients are averaged to update parameters. Pro: Ensures consistency. Con: Slower due to synchronization overhead. Asynchronous Training: Workers independently compute and update parameters. Pro: Faster and resilient to worker failures. Con: May lead to stale updates and slower convergence.

Answer 20

Model parallelism divides a model across multiple devices, with each processing different layers or components. Use Case: Models too large to fit on a single device (e.g., large transformers). Challenge: Synchronizing intermediate outputs between devices.

Answer 21

MirroredStrategy replicates models across GPUs on a single machine. Data Distribution: Global batch split among GPUs. Gradient Updates: Gradients averaged across replicas (Allreduce). Use Case: Single-machine multi-GPU setups.

Answer 22

Multi-Worker MirroredStrategy extends MirroredStrategy to multiple machines, each with GPUs. Synchronization: Gradients are shared across workers. Data Sharding: Workers process non-overlapping data. Use Case: Large-scale synchronous distributed training.

Answer 23

TPUs (Tensor Processing Units) are custom accelerators optimized for matrix computations. Advantages: Faster training, especially for deep learning. Challenges: Requires optimized input pipelines due to TPU speed. Strategy: Use TPUStrategy for seamless TensorFlow integration.

Answer 24

Batch Predictions: Precomputed for large datasets; optimized for throughput. Online Predictions: Real-time predictions for individual queries; optimized for latency.

Answer 25

Throughput: Queries per second (QPS). Latency: Response time per query. Cost: Infrastructure and maintenance expenses.

Answer 26

Cloud Dataflow processes datasets by: Reading data from GCS or BigQuery. Enriching data with model predictions (TensorFlow SavedModel or TensorFlow Serving). Writing enriched data back to storage.

Answer 27

SavedModel: Fastest for batch pipelines, reduces overhead. TensorFlow Serving: Easier maintenance, supports real-time queries.

Answer 28

Scale: Handles large datasets and complex models. Speed: Reduces training time by leveraging multiple devices. Flexibility: Adapts to diverse workloads with parallelism strategies.

Answer 29

The tf.data API creates scalable input pipelines for training: Supports sharded datasets for large files. Handles transformations like normalization and batching. Optimizes IO performance with prefetching.

Answer 30

Minibatching groups multiple data points for simultaneous processing. Advantages: Improves computational efficiency and reduces parameter update overhead. Challenge: Larger batches require more memory.

Answer 31

Parameter servers store model weights and handle updates during asynchronous training. Advantages: Scales well for sparse models. Challenges: Can create network bottlenecks for dense models.

Answer 32

Composability, portability, and scalability. Composability: Ability to combine microservices and choose components that make sense for the problem Portability: Capability to move machine learning workflows across different environments (laptop, on-premises, cloud) Scalability: Ability to scale across accelerators (GPUs, TPUs), storage, skillsets, teams, and experiments

Answer 33

Kubeflow is an open-source machine learning platform built on Kubernetes that: Enables machine learning pipeline orchestration Allows deployment of ML workflows across different environments (phone, laptop, on-premises cluster, cloud) Provides consistent code execution with minimal configuration changes Extends Kubernetes' capabilities with ML-specific frameworks and libraries

Answer 34

Potential scenarios include: Being tied to on-premises infrastructure Data privacy or regulatory constraints preventing full cloud migration Multi-cloud data production or consumption requirements Edge computing needs (IoT devices, local inference) Gradual cloud migration strategy Avoiding vendor lock-in

Answer 35

Edge machine learning involves: Performing model inference directly on local devices Reducing network latency and bandwidth consumption Enabling machine learning in environments with poor connectivity Supporting privacy-preserving techniques like federated learning Extracting meaningful insights from sensor data without constant cloud communication

Answer 36

Mobile TensorFlow optimization involves: Reducing code footprint Supporting quantization and lower-precision arithmetic Embedding models directly on devices Using thin wrappers for native implementation Performing inference on worker threads to avoid blocking main thread Potentially sacrificing model accuracy for performance

Answer 37

Federated learning is an approach where: Model updates are aggregated from multiple devices Models are continuously trained on individual user devices Allows collective model improvement without centralized data collection Individual user experiences are personalized Privacy is maintained by only sharing model updates, not raw data

Answer 38

Challenges include: Reconfiguring entire technology stack for each new environment Replicating library dependencies Recreating testing environments Managing different infrastructure requirements Ensuring consistent model performance across varied computational resources

Answer 39

Kubernetes supports hybrid cloud ML by: Enabling container orchestration across different environments Providing consistent deployment mechanisms Allowing seamless migration between on-premises and cloud infrastructure Supporting scalable and portable machine learning workflows Reducing infrastructure management overhead

Answer 40

Trade-offs include: Reduced model complexity and accuracy Limited model maintainability Inability to resume training from optimized model graphs Potential performance improvements Smaller model size and reduced computational requirements

Answer 41

Hybrid image recognition typically involves: Performing initial feature extraction locally Running neural network on mobile device to extract object labels Sending processed, reduced-complexity data to cloud Reducing network bandwidth consumption Enabling faster response times

Answer 42

Kubeflow's value stems from: Providing open-source, flexible ML pipeline management Supporting multi-environment deployment Reducing infrastructure lock-in Enabling consistent workflow across different computational resources Simplifying complex ML workflow orchestration

Answer 43

Mobile ML data analytics can: Detect patterns in motion sensor data Analyze GPS tracking information Extract meaningful feature vectors from raw sensor data Perform local preprocessing before cloud transmission Enable intelligent, context-aware mobile applications

Answer 44

Motivations include: Reducing network latency Minimizing bandwidth consumption Enabling offline functionality Supporting privacy-preserving computation Personalizing user experiences Operating in low-connectivity environments

Answer 45

Architectural considerations involve: Managing diverse team skillsets Coordinating across research, engineering, and monitoring teams Balancing computational resources Ensuring consistent model performance Supporting flexible, scalable infrastructure Maintaining interoperability between environments

Answer 46

Optimization strategies include: Quantizing neural network nodes Converting variable nodes to constants Using smaller, less complex model architectures Implementing efficient inference libraries Leveraging platform-specific optimization tools (Bazel, CocoaPods) Minimizing model size and computational requirements

Answer 47

In mobile ML architectures: Microservices are often impractical due to added latency Direct library integration is preferred over process delegation Emphasis on lean, embedded model execution Focus on efficient, localized computational approaches

Answer 48

Hybrid ML systems address this by: Providing consistent workflow across environments Enabling flexible model training locations Supporting distributed model development Allowing seamless transition between training and inference platforms Maintaining model portability and reproducibility

Answer 49

Implications include: Localized data processing reducing central data exposure Federated learning minimizing raw data transmission Enabling compliance with strict data protection regulations Providing granular control over data movement Reducing centralized data collection risks

Answer 50

Traditional language models were primarily focused on predicting single words or short sequences based on immediate context, while LLMs represent a significant evolution in capability and scale. Key differences include: Scale: LLMs contain billions of parameters (from BERT's 110M to PaLM 2's 340B+) compared to traditional models Sequence Length: Modern LLMs can process and predict entire documents, not just individual words Architecture: LLMs typically use Transformer architecture with self-attention mechanisms, allowing them to capture long-range dependencies Emergent Abilities: LLMs demonstrate capabilities beyond their training objectives, such as reasoning, code generation, and mathematical problem-solving Resource Requirements: LLMs require significant computational resources and specialized infrastructure for training and deployment

Answer 51

Self-attention is a fundamental mechanism in Transformer architecture that enables tokens to dynamically focus on relevant parts of the input sequence. The mechanism allows for parallel processing of sequences and the process works by: Each token computes attention scores with every other token in the sequence Attention scores determine how much each token should "pay attention to" other tokens which enables the model to capture both local and long-range dependencies For example, in the sentence "The animal didn't cross the street because it was too tired": The pronoun "it" needs to determine which noun it refers to Self-attention helps the model understand "it" refers to "animal" rather than "street" This is achieved by computing attention weights between "it" and all other tokens The highest weights will be assigned to the most relevant context words

Answer 52

LoRA is a parameter-efficient fine-tuning technique that optimizes model adaptation while minimizing computational overhead. Key aspects include: Core Mechanism: Freezes pretrained model weights Injects trainable low-rank matrices into each Transformer layer Exploits the rank-deficiency of weight changes during adaptation Technical Implementation: Introduces matrices A and B as low-rank decomposition Updates only these smaller matrices during fine-tuning Maintains model quality while reducing parameter count Advantages: Significantly reduced memory footprint No additional inference latency Enables efficient task-switching Allows sharing of pretrained models across multiple tasks Reduces storage requirements for fine-tuned models

Answer 53

Vertex AI Reasoning Engine is a managed runtime service that enables deployment of LangChain-based applications on Google Cloud. Key components and features include: System Components: LLM integration (e.g., Gemini models) Tool/Function calling capabilities Orchestration framework using LangChain Managed runtime environment Integration Benefits: Simplified deployment process Built-in security and privacy controls Automatic scaling Integration with Google Cloud services Support for various frameworks (LangChain, OneTwo, LangGraph) Deployment Flow: Development of LangChain application Configuration of tools and external APIs Deployment to managed runtime Monitoring and management through Vertex AI

Answer 54

In-context learning is a phenomenon where LLMs can learn new tasks from just a few examples without parameter updates. Recent research from MIT, Google Research, and Stanford reveals: Mechanism: Large models contain implicit smaller, linear models within their hidden states The larger model implements learning algorithms to train these internal models No parameter updates required in the main model Technical Implementation: Occurs in early layers of the transformer Utilizes hidden states to store task-specific information Implements simple learning algorithms internally Implications: Enables few-shot learning capabilities Reduces need for task-specific fine-tuning Shows models are more sophisticated than simple pattern matching Opens new possibilities for efficient model adaptation

Answer 55

Effective prompt engineering requires understanding several key principles and techniques: Structural Elements: Clear role definition Contextual information Specific instructions Output format specification Advanced Techniques: Zero-shot prompting for simple tasks Few-shot prompting for complex patterns Chain-of-thought prompting for reasoning tasks Role-based prompting for specialized behaviors Optimization Strategies: Iterate and refine prompts Use specific keywords and constraints Break complex tasks into smaller steps Implement self-evaluation mechanisms Leverage example libraries and templates

Answer 56

QLoRA enhances LoRA by introducing quantization techniques to further reduce memory requirements while maintaining performance: Technical Components: 4-bit NormalFloat (NF4) quantization Double Quantization for constants Low-rank adaptation matrices Parameter-efficient fine-tuning Key Improvements: Reduced memory footprint through quantization Maintained model quality Enhanced efficiency for resource-constrained environments Broader applicability across model architectures Implementation Benefits: Enables fine-tuning on consumer GPUs Reduces storage requirements Maintains performance parity with full fine-tuning Supports various model architectures (RoBERTa, DeBERTa, GPT-2/3)

Answer 57

The four main components are: LLM Component: Processes queries and generates responses Integrates with function calling Handles model versioning and lifecycle Tool Component: Communicates with external APIs Implements Gemini Function Calling Supports LangChain Tool/Function Calling Handles database and service integrations Orchestration Framework: Manages application flow Implements LangChain templates Controls deterministic behavior Structures system components Managed Runtime: Handles deployment and scaling Provides security and monitoring Manages API endpoints Ensures system reliability These components interact in a workflow where: User queries are processed by the LLM Tools are called as needed for external data Orchestration framework manages the flow Runtime environment handles operational aspects

Answer 58

PEFT implementation requires careful consideration of several factors: Technical Challenges: Balancing performance vs. efficiency Maintaining model quality Managing training time Optimizing hyperparameters Implementation Considerations: Choice of PEFT method (LoRA, QLoRA, AdaMix, etc.) Resource constraints Task requirements Model architecture compatibility Trade-offs: Memory usage vs. computational cost Training time vs. parameter efficiency Performance vs. resource usage Flexibility vs. complexity

Answer 59

The system flow in Vertex AI Reasoning Engine follows a specific sequence: Query Processing: User submits query Agent formats prompt for LLM LLM processes initial prompt Tool Integration: LLM determines tool necessity Generates FunctionCall if needed Tool executes and returns results Response Generation: LLM processes tool results Generates final content Agent formats response Flow Control: Handles multiple tool calls if needed Manages conversation context Ensures response quality Maintains system stability

Answer 60

Emergent abilities are capabilities that appear in larger language models without explicit training: Types of Emergent Abilities: Mathematical reasoning Code generation Logical deduction Multi-step problem solving Task decomposition Characteristics: Appear above certain model size thresholds Not explicitly trained for Often improve with scale Demonstrate complex reasoning Applications: Zero-shot task handling Complex problem solving Creative generation Analytical tasks

Answer 61

Successful LLM deployment requires attention to several critical areas: Infrastructure Components: Model selection and versioning Resource allocation Scaling configuration Monitoring setup Operational Considerations: Security and access control Performance monitoring Cost optimization Error handling Management Aspects: Version control Deployment strategies Update procedures Backup and recovery Performance optimization

Answer 62

Self-attention is a core mechanism that enables contextual understanding: Technical Implementation: Computes attention scores between all tokens Uses Query, Key, and Value matrices Implements parallel processing Enables global context awareness Performance Impact: Improves long-range dependency capture Enhances context understanding Enables better feature extraction Supports parallel processing Architectural Benefits: No fixed window size limitations Dynamic context weighting Position-aware processing Flexible feature capturing

Answer 63

Effective prompt engineering for Vertex AI requires understanding several aspects: Structure: Clear task definition Contextual information Specific instructions Output format specification Best Practices: Use consistent formatting Provide relevant examples Include constraints Implement validation Optimization: Iterate on prompts Test different approaches Monitor performance Adjust based on feedback

Answer 64

Parameter efficiency in fine-tuning focuses on optimizing model adaptation: Core Concepts: Minimize trainable parameters Maintain model quality Reduce resource requirements Enable efficient deployment Implementation Methods: Low-rank adaptations Quantization techniques Selective fine-tuning Efficient architecture modifications Benefits: Reduced memory usage Lower computational costs Faster training time Improved deployment flexibility

Answer 65

Generative and discriminative models differ in their fundamental mathematical approaches: Generative models: Capture the joint probability distribution p(X, Y) or p(X) for unlabeled data Can generate new data instances that resemble the training distribution Model the actual distribution of each class in the feature space Learn the intrinsic patterns and structure of the input data Example applications: GANs, language models, image synthesis Discriminative models: Capture the conditional probability p(Y|X) Focus on learning boundaries between classes Don't model the underlying data distribution More efficient for classification tasks Example applications: Random Forests, SVMs, standard Neural Networks for classification Key distinction: Generative models must learn the full data distribution, making them more complex but more versatile, while discriminative models only need to learn decision boundaries, making them more efficient for specific tasks.

Answer 66

Modern language models operate through several key mechanisms: Core Training Approach: Based on next-token prediction in a sequence Trained on massive text corpora (often 45+ terabytes of text data) Utilize self-supervised learning rather than traditional supervised approaches Key Technical Components: Transformer Architecture: Self-attention mechanisms Parallel processing capability Direct modeling of long-range dependencies Training Process: Pre-training on broad internet-scale data Fine-tuning for specific tasks Token-based prediction and generation Context Understanding: Builds probabilistic understanding of word relationships Captures semantic and syntactic patterns Maintains context across long sequences Performance Characteristics: Can generate coherent, contextually appropriate text Handles various tasks (completion, translation, summarization) Improves with scale (both data and model size)

Answer 67

Temperature (θ) is a hyperparameter that controls the randomness in the output distribution of language models: Mathematical Definition: Standard softmax: σ(zi) = exp(zi) / Σ(exp(zj)) Temperature-adjusted softmax: σ(zi) = exp(zi/θ) / Σ(exp(zj/θ)) Effects: Lower temperature (θ < 1): Makes distribution more peaked Increases confidence in high-probability tokens More deterministic outputs Better for factual responses or specific tasks Higher temperature (θ > 1): Flattens the distribution Increases diversity in outputs More creative/random responses Better for creative writing or exploration Use cases: Low temperature: Question answering, factual generation High temperature: Creative writing, brainstorming θ = 1.0: Standard softmax behavior

Answer 68

The Transformer architecture introduced several revolutionary concepts: Key Innovations: Self-Attention Mechanism: Direct modeling of word relationships regardless of position Parallel computation of attention scores Multi-head attention for different relationship types Positional Encoding: Maintains sequence order without recurrence Allows parallel processing of entire sequences Advantages over RNNs: Computational Efficiency: Parallel processing vs. sequential processing Better utilization of modern hardware (GPUs/TPUs) Constant time complexity for long-range dependencies Learning Capability: Better capture of long-range dependencies No vanishing gradient problems More stable training Performance: Superior results on translation tasks Better scalability with model size More efficient training

Answer 69

Key challenges and mitigation strategies: Challenges: Resource Requirements: Massive computational needs Large training datasets Significant storage requirements Quality Issues: Hallucination and factual inaccuracies Biased outputs Inconsistent performance Ethical Concerns: Privacy implications Potential misuse Copyright issues Mitigation Strategies: Technical Solutions: Implement robust monitoring systems Use smaller, specialized models when possible Apply fine-tuning for specific use cases Implement content filtering and safety measures Operational Controls: Human-in-the-loop validation Clear usage guidelines Regular model evaluation and updating Audit trails for model decisions Risk Management: Regular bias assessment Legal compliance checks Clear documentation of limitations Incident response procedures

Answer 70

The training process differences are significant: Discriminative Models: Computational Requirements: Generally lower computational needs Faster training times More efficient optimization Focused on decision boundaries Data Requirements: Requires labeled data Can work with smaller datasets More efficient data utilization Generative Models: Computational Requirements: Significantly higher computational needs Longer training times Complex optimization processes Must model entire data distribution Data Requirements: Can use unlabeled data Requires larger datasets More sensitive to data quality Needs diverse training examples

Answer 71

Attention mechanisms transformed NLP by introducing: Core Concepts: Direct Relationships: Models relationships between all tokens directly Eliminates need for sequential processing Enables parallel computation Attention Computation: Query, Key, Value paradigm Soft alignment between elements Weighted sum of values based on attention scores Applications: Translation: Direct word alignment Context-aware translation Better handling of idioms General NLP: Document understanding Question answering Summarization Advantages: Better long-range dependency modeling Interpretable attention weights Scalable to large sequences

Answer 72

Production deployment requires careful consideration of: Technical Considerations: Infrastructure: Scaling requirements Latency management Resource optimization Monitoring systems Model Serving: API design Batch vs. real-time inference Version control A/B testing capability Operational Considerations: Quality Control: Output validation Performance monitoring Error handling Feedback loops Safety Measures: Content filtering Rate limiting User authentication Audit logging Business Considerations: Cost Management: Compute optimization Resource allocation ROI monitoring Scaling strategies Compliance: Data privacy Regulatory requirements Model documentation Usage policies

Answer 73

Language models employ several mechanisms for context handling: Context Processing: Attention Mechanisms: Multi-head attention for different aspects Self-attention for internal context Cross-attention for external context Token Representation: Contextual embeddings Position-aware processing Subword tokenization Disambiguation Strategies: Statistical Learning: Probability distribution over meanings Context-dependent representation Co-occurrence patterns Architectural Features: Bidirectional context Layer-wise processing Residual connections Example Case: Word "bank" disambiguation: Context window analysis Attention to relevant tokens Probability distribution over meanings

Answer 74

Key distinctions include: Model Capabilities: Traditional ML: Focused on specific tasks Rule-based or statistical Limited generalization Task-specific training Generative AI: Multi-task capability Neural architecture based Better generalization Transfer learning enabled Data Requirements: Traditional ML: Structured data focus Smaller datasets Task-specific data Clear labels needed Generative AI: Handles unstructured data Massive datasets General knowledge learning Self-supervised learning Applications: Traditional ML: Classification Regression Clustering Specific predictions Generative AI: Text generation Image creation Code synthesis Creative tasks

Answer 75

Large language models maintain consistency through: Technical Mechanisms: Attention Span: Context window management Token position awareness Memory mechanisms Attention patterns Coherence Strategies: Topic tracking Entity recognition Narrative flow maintenance Logical progression Implementation Aspects: Architecture Features: Long-range dependencies Cross-attention mechanisms State maintenance Context compression Training Approaches: Document-level training Coherence objectives Style consistency Structure learning

Answer 76

Self-supervised learning in language models involves: Core Principles: Training Approach: No explicit labels needed Uses internal structure of data Creates own supervisory signals Learns patterns automatically Implementation: Masked language modeling Next token prediction Sequence reconstruction Contrastive learning Advantages: Data Efficiency: Uses unlimited text data No manual labeling Natural language structure Rich context learning Model Capabilities: General language understanding Transfer learning potential Robust representations Flexible task adaptation

Answer 77

Model size impacts performance through: Scale Effects: Advantages: Better pattern recognition Improved generalization More robust representations Enhanced task performance Challenges: Increased compute needs Higher memory requirements Longer training times Greater deployment costs Trade-offs: Technical: Performance vs. efficiency Accuracy vs. speed Complexity vs. maintainability Flexibility vs. specialization Practical: Cost vs. benefit Latency vs. capability Resource use vs. performance Deployment options

Answer 78

Modern language models address vocabulary challenges through: Token Processing: Subword Tokenization: Byte-Pair Encoding (BPE) WordPiece SentencePiece Character-level fallback Handling Mechanisms: Compositional representation Context-aware processing Unknown token handling Rare word treatment Implementation Strategies: Technical Approaches: Dynamic vocabulary Hierarchical encoding Attention mechanisms Token merging Performance Optimization: Vocabulary size balance Frequency-based decisions Efficiency considerations Coverage optimization

Answer 79

Fine-tuning and few-shot learning differ in: Fine-tuning: Process: Updates model weights Requires training data Gradient-based learning Task-specific adaptation Characteristics: Permanent changes Better performance Resource intensive Task specialization Few-shot Learning: Process: Uses examples in prompt No weight updates Pattern matching In-context learning Characteristics: No permanent changes More flexible Less resource intensive General capability

Answer 80

MLOps is a set of practices, processes, and tools to operationalize machine learning systems effectively. For generative AI: Traditional MLOps Principles: Standardize workflows for predictive AI tasks like regression and classification. Generative AI Adaptations: Pre-trained model discovery instead of building models from scratch. Introduction of customization and fine-tuning phases. Focus on unstructured outputs and unique metrics (e.g., fluency, factuality). This evolution ensures that MLOps accommodates generative AI’s complexity and potential.

Answer 81

Predictive AI: Makes decisions or predictions using pre-existing data (e.g., classification, regression). Applications: Fraud detection, demand forecasting. Generative AI: Creates new content by learning patterns in data (e.g., text, images). Applications: Text summarization, image generation, chatbots. Generative AI extends the scope of AI by producing novel outputs, demanding additional infrastructure and operational considerations.

Answer 82

Traditional ML: Training: Labeled data + model training. Serving: Deploy model + inference pipeline. Generative AI: Training: Pre-trained models + customization (fine-tuning). Serving: Generating outputs with prompts and embeddings. Generative AI integrates phases like data curation and embedding management to handle its complexity.

Answer 83

Curated Data: Domain-specific, high-quality datasets tailored for fine-tuning generative models. Traditional Datasets: Typically large, labeled datasets for training from scratch. Curated data ensures generative models align with specific tasks, enhancing relevance and performance.

Answer 84

New artifacts include: Prompts: Instructions for guiding outputs. Embeddings: Dense vector representations for unstructured data. Adaptive Layers: Fine-tuned model components. Governance involves managing versions, tracking lineage, and integrating tools like Vertex AI Model Registry and Feature Store.

Answer 85

Vertex AI provides: Model Garden: Access to pre-trained models from Google, open-source, and third-party providers. Generative Studio: A user-friendly interface for fine-tuning models and testing prompts. These tools simplify the exploration of models, reducing data collection and preparation time.

Answer 86

Fine-tuning involves: Adapting pre-trained models to domain-specific tasks. Techniques like: Supervised Tuning: Uses labeled data. Reinforcement Learning with Human Feedback (RLHF): For tasks with subjective outputs (e.g., summarization). Optimizing additional artifacts like embeddings and prompts. Traditional training typically builds models from scratch, focusing on core algorithms and datasets.

Answer 87

Prompt engineering is crafting instructions for language models to produce desired outputs. Its significance: Enhances model accuracy without retraining. Simplifies application development for diverse tasks. Tools like LangChain and Vertex AI assist in designing, testing, and refining prompts.

Answer 88

Challenges: Handling unstructured data (text, images, video). Enabling applications like search, recommendations, and similarity matching. Management: Vertex AI Feature Store and Vector Search store, retrieve, and serve embeddings efficiently.

Answer 89

Adaptive tuning updates only specific weights in a model, minimizing resource requirements. Tools: Vertex Model Registry: Tracks versions of adaptive layers. Vertex AI Pipelines: Automates tuning workflows for reproducibility and lineage.

Answer 90

Generative AI metrics: Fluency: Naturalness of text outputs. Factuality: Adherence to facts. Brand Reputation: Alignment with brand guidelines. Traditional metrics like accuracy and precision may not fully capture generative model performance.

Answer 91

Safety Scores: Assess risks across categories (e.g., bias, toxicity). Recitation Checking: Detects unoriginal content by comparing outputs with existing data. These measures ensure quality and trustworthiness of generated outputs.

Answer 92

Creates evaluation datasets of prompts and expected responses. Computes metrics for fluency, factuality, and relevance. Provides tools for monitoring safety scores and content authenticity.

Answer 93

Challenges: High computational requirements. Complex distributed training. Vertex AI Solutions: Provides GPU/TPU support. Simplifies distributed training with pre-configured pipelines.

Answer 94

RLHF involves: Human feedback to fine-tune outputs. Applicable for tasks like summarization and chatbot responses. Advantages: Aligns model outputs with user expectations. Improves handling of ambiguous or subjective tasks.

Answer 95

Grounding capabilities align model outputs with external data sources, reducing hallucinations. Tools like Vertex PaLM ensure outputs reflect real-world context, improving reliability.

Answer 96

Vertex AI offers: Embedding APIs: Generates vector representations for semantic analysis. Vector Search: Facilitates efficient querying and retrieval.

Answer 97

Vertex Extensions: Enable real-time connections to enterprise systems. Extend generative AI workflows to real-world data and actions.

Answer 98

Use embeddings for semantic comparisons. Leverage grounding capabilities to align outputs with enterprise data. Employ Vertex AI’s managed tools for real-time integration.

Answer 99

Incorporate phases like pre-trained model discovery and tuning. Manage artifacts like prompts and embeddings. Evaluate using fluency, factuality, and reputation metrics. Address safety and recitation with advanced monitoring tools.

Scaling prototypes into ML models Flashcards

(123 cards)