Evaluating Model Fit Flashcards
Model Fitting
Ubiquitous in modern psychology
All models require some measure of ‘goodness of fit’, to say how well they have fit the data
Yerkes-Dodson law
Quadratic term can capture model better
Lots of different features describing the variation does describe the data any better
Cost Function
Describes the goodness of fit of a model - how big the discrepancy is and the actual values observed
Allows researchers to objectively choose the best paramters
Maximum Likelihood Approach
Many models use a maximum likelihood approach of fitting parameters
Relies on different models yielding different probabilities of observing outcomes
Summary
Principles of model fitting are not unique to data analysis and statistics - apply broadly to computational models of cognition
Fitting model - need some measure of how well that model fits the data - cost function is a mathematical description of model fit
Want to minimise the cost function in order to find the best-fitting model parameters - for maximum likelihood, one would minimise the negative of the likelihood
R2 as a Measure of Model Fit
% variance explained in the model
Total sum of square is how far things are away from the mean
R2 will increase so the model explains more and more variance
Summary
R2 is commonly used description of model fit, which captures the percentage of variance explained by a model
Adding more parameters to a model will invariably lead to a smaller residual error, and so lead to more ‘variance explained’
Asking which model is best, involves the consideration of more than just variance explained in the data used for fitting the model - simple option is to penalise based on the number of parameters in the model
Generalisability of Model
More complex models the more the goodness of fit goes up and up
Overfitting
Gets worse with fewer datapoints
Major problem when training very big models
Conclusion
Key feature of a good model fit is that it makes reliable predictions of new, unseen data that have not been used to train it - generalisation to new observations
Cross-validation is a method where the data is trained on some data, and tested on held-out data - improves reliability of model fit/model selection
Can help in situation where the model is under constrained by the data - in particular where the number of parameters is greater than the number of datapoints