Models Flashcards
What do we model with supervised learning?
The relationship between independent and the dependent variables.
When do we use regression to learn?
When the response (or DV’s) are numerical in supervised learning
When do we use classification to learn?
When the response (or DV’s) is categorial in supervised learning
.
What do we model with unsupervised learning?
The relationship between observed IV’s and some unobserved (latent) variables. (such that we can estimate the IV’s based on the laten variables.)
When do we use clustering to learn?
When we want to find latent structure in unsupervised learning
When do we use dimensionality reduction to learn?
To simplify datasets while retaining information in unsupervised learning.
When asuming a model for our data, we disregard the error.
False, we asume that the results (DV’s) are dependent on error, whilst error is independent of the input variables (IV’s)
What are reducible errors?
These are error margins in a model that can be reduced by choosing a different model or better predictors.
What are irreducible errors?
These errors are always there as an upper bound on model quality. They exist due to the fact that we can never have a perfect model.
How do we estimate the correct model for our data?
We use training data and an assumption of the class of models in order to estimate best model that fits.
What are parametirc models?
Models where we asume a class of models by a fixed set of parameters, and we select the parameters such that it best fit the training data.
What are non-parametric models?
We asume a class of models, but we dont asume their form. Therefore the training data can be seen as the parameters of the model.
What is meant with simple models?
A parametric model: easy to interpret, often not flexible enough.
What is meant with an complex models
A non-parametric model: complex to interpret, fits the training data quite well.
What is overfitting in modelling?
When there were too many parameters in a non-parametric model, so the model perfectly fits the training data but produces very bad estimates with new data.