fine tune - hyperparams - transfer Flashcards
What is the purpose of Grid Search in hyperparameter tuning?
Grid Search exhaustively searches over a specified parameter grid by evaluating all possible combinations of hyperparameter values to find the best configuration.
How does Random Search differ from Grid Search in hyperparameter tuning?
Random Search randomly samples hyperparameter combinations within defined ranges, which can be more efficient than Grid Search when dealing with large parameter spaces.
What is Bayesian Optimization in the context of hyperparameter tuning?
Bayesian Optimization uses probabilistic models to guide the search for optimal hyperparameters, balancing exploration and exploitation.
What is the objective of Learning Rate Adjustment in fine-tuning methods?
Learning Rate Adjustment involves modifying the learning rate during training to improve convergence, commonly using techniques like step decay, exponential decay, or learning rate schedulers.
What does Feature Extraction mean in fine-tuning?
Feature Extraction involves freezing the pretrained model layers and using them to extract features from the data, typically followed by training a new classifier on top.
What does Full Fine-Tuning entail in neural network training?
Full Fine-Tuning involves retraining the entire pretrained model on new data, allowing all layers to adapt to the specific task.
What is the role of Adding Task-Specific Layers in fine-tuning?
Adding Task-Specific Layers involves appending custom layers to a pretrained model for specialized tasks, such as classification or regression, while optionally freezing other layers.
What is Pretrained Model Utilization in transfer learning?
Pretrained Model Utilization leverages models trained on large datasets to solve related tasks by adapting their learned features to new data.
How can a pretrained model be used for classification tasks in transfer learning?
A pretrained model can be used by replacing its final classification layer with a new layer tailored to the target classes, and then fine-tuning or training only that layer.
What does utilizing a pretrained model for predictions involve?
It involves using the pretrained model as is to make predictions on new data without further training, relying on its existing learned features.
What is the purpose of logits in transfer learning?
Logits, the raw output of a model before applying an activation function like softmax, can be used as intermediate representations for downstream tasks or ensembling models.
How is an encoder from a pretrained model used in transfer learning?
The encoder, typically the feature extraction part of the model, is used to generate embeddings from input data, which can be fed into custom classifiers or task-specific layers.
Why is Grid Search computationally expensive?
Grid Search evaluates all possible combinations of hyperparameters, which can be infeasible for large search spaces or computationally intensive models. The # combinations grows proportionaly with the exponent of the amount of features.
What is the primary benefit of Random Search over Grid Search?
Random Search can cover a wide range of hyperparameter values without evaluating all combinations, often finding good configurations faster.
What is an advantage of Bayesian Optimization for tuning?
Bayesian Optimization focuses on promising regions of the hyperparameter space, reducing the number of evaluations required compared to exhaustive methods.