Section 5 Model Tuning Flashcards
(28 cards)
Define a model/classifier
We generally defined a model (or classifier) as a function used to relate the target variable y to the input variables X
What are hyperparameters in general in a model (lamda)
Any parameter you cannot estimate from data but it has an impact on predictive performance of your model.
Can be used to control the complexity of a model or the optimization algorithm.
They are employed in the training process to estimate the parameters.
Can be set manually for a specific predictive problem.
Their selection can be related to a model selection procedure (tuning).
Cannot be learned during the training process.
What form can hyperparameters take
Characteristics of the loss function used for training and learning
Variables inherent to the algorithm and/or optimization method implemented for training
Variables relating to the complexity of the model.
What do the model parameters w do (so we can differentiate from hyperparameters)
Characterise the specifics of a certain model and are required for predictions.
Are learned (estimated) during the training process - minimise the loss function
Cannot be set manually.
Explain model training
Process of using the training data to estimate/learn the parameters w by minimising the training loss for fixed values of the hyperparameters.
Explain model testing
Process of evaluating the predictive performance of the model on the out of sample data
How do the hyperparameter relate to model complexity
Hyperparameters (lamda) control the complexity of the model and its ability to fit the training data by means of the optimisation procedure.
Define the complexity of a model and what are the consequences with high and low complexity
Complexity refers to the flexibility of a model to fit a variety of functions and a model can be made arbitrarily complex.
Models with low complexity may be too simple and not able to fit well the training set.
Models with high complexity will fit the training data perfectly, but will generalise poorly.
Why is training performance unreliable?
The training performance is an optimistic estimate of a model’s performance
Comparing training and out of sample data where is a model optimised
there’s a gap between training and out-of-sample loss and predictive performance. The gap is dependent on the hyperparameters and is most important.
Underfitting or overfitting data will increase the gap, an optimally tuned model will minimise this gap.
What does underfitting and overfitting mean
Underfitting: model does not obtain a sufficiently low loss value on the training data, not sufficiently good predictive performance on the training data. This will lead to large bias and a poor predictive performance on the test data.
Overfitting: model learns patterns which are specific to the training data and not general to the underlying data generating process. This corresponds to a gap between training and test error too large, and poor generalisation ability of the model.0 bias, but large variance in predictions.
Explain tuning
Tuning is the act of trying to minimise the gap between training data and out of sample data loss functions and accuracy. Its the process of using validation data to select the optimal hyperparameter values mapping to maximum validation predictive performance
Explain bias
The error introduced by approximating the data generating process by a simpler model is denoted bias.
Explain variance
The variance of a model is proportional to the flexibility of a model, the more flexible, the more variance.Its the model’s stability.
What is the expected generalisation error
Expected generalisation error=Variance of model + Bias of model ^2 + Variance of the error terms
What is the optimal complexity in a model
The optimal complexity is the complexity level which balances bias and variance.
How can we consider different hyperparameter values in a model
Hyperparameters λ specify/affect the characteristics of a model, we can consider two (or more) versions of a model with the same structural form f(·) but different hyperparameter values. Tuning means using validation data to select the optimal values
How is cross valdiation applied in model tuning
We can use cross-validation to compare models with different hyperparameters, i.e. we can use cross-validation to select the optimal λ and hence the best support vector machine classifier to use.
If training and validation procedure is implemented in a resampling framework how are optimal hyperparameters determined?
The one maximising the average predictive performance over the replicates.
How do we assess generalised predictive performance after tuning
Once the model has been tuned, one needs to assess its generalised predictive performance using separate test data.
What comes first model tuning or selection?
Model tuning can be implemented in conjunction with model selection, whereby different types of models are compared also across different instances: comparing models and tuning happen at the same time.
Give examples of hyperparameters in logistic regression
The step size η of the gradient descent algorithm is a hyperparameter that controls the optimization process (no need to worry about that).
The classification threshold τ is a hyperparameter that controls the classification of the data points where tuning is done with respect to predictive performance and model purpose.
Give examples of hyperparameters in SVM classifier
In a support vector machine classifier, the hyperparameters of the kernel function control the complexity of the model.
Gaussian Radial Basis Function kernel (GRBF) – σ scaling coefficient
Also the cost is a hyperparameter which controls complexity.
Give an example of a hyperparameter for classification trees
For classification trees the problem of defining a tree T can be formulated as a loss minimization problem where Lambda hyperparameter controls its complexity