Models Flashcards

1
Q

What do we model with supervised learning?

A

The relationship between independent and the dependent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When do we use regression to learn?

A

When the response (or DV’s) are numerical in supervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

When do we use classification to learn?

A

When the response (or DV’s) is categorial in supervised learning
.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do we model with unsupervised learning?

A

The relationship between observed IV’s and some unobserved (latent) variables. (such that we can estimate the IV’s based on the laten variables.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When do we use clustering to learn?

A

When we want to find latent structure in unsupervised learning

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When do we use dimensionality reduction to learn?

A

To simplify datasets while retaining information in unsupervised learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When asuming a model for our data, we disregard the error.

A

False, we asume that the results (DV’s) are dependent on error, whilst error is independent of the input variables (IV’s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are reducible errors?

A

These are error margins in a model that can be reduced by choosing a different model or better predictors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are irreducible errors?

A

These errors are always there as an upper bound on model quality. They exist due to the fact that we can never have a perfect model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we estimate the correct model for our data?

A

We use training data and an assumption of the class of models in order to estimate best model that fits.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are parametirc models?

A

Models where we asume a class of models by a fixed set of parameters, and we select the parameters such that it best fit the training data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are non-parametric models?

A

We asume a class of models, but we dont asume their form. Therefore the training data can be seen as the parameters of the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is meant with simple models?

A

A parametric model: easy to interpret, often not flexible enough.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is meant with an complex models

A

A non-parametric model: complex to interpret, fits the training data quite well.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is overfitting in modelling?

A

When there were too many parameters in a non-parametric model, so the model perfectly fits the training data but produces very bad estimates with new data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why does a model overfit?

A

When given enough parameters, the model will fit so it counters all error present in the training data. It has no knowledge of underlying processes, it only knows how to model the training data.

17
Q

How do we determine the accyracy of our model?

A

By testing our model with new data; that is different from the training data of the model, but comes from the same dataset as our training data!

18
Q

What is the purpose of a test data set?

A

In order to check if our estimated model based on the training set is still accurate when evaluated with an new data set.

19
Q

In modeling, do we want our training accuracy as low as possible?

A

No, this means the data is overfitted

20
Q

In modeling, do we want our test accuracy as low as possible?

A

Yes, this means the current model best reflects the data.

21
Q

What is model bias?

A

Bias refers to the inability of the estimated model to represent the underlying process.

22
Q

How can you reduce model bias?

A

By increasing the model complexity.

23
Q

What is model variance?

A

Variance refers to how much the estimated model varies with respect to the training data. (an different training set would result in a very different model when of equal complexity).

24
Q

How can we reduce model variance?

A

It is reduced by decreasing model complexity or increasing data size and diversity.

25
Q

Why would we regard to a bias-variance trade-off?

A

Because the optimal complexity for the estimated model lies (closely) on the intersection of the model’s bias and variance.

26
Q

Which colour represents the model’s Bias?

A

Purple: Bias decreases with complexity.

27
Q

Which colour represents the model’s Variance?

A

Green, the variance increases with complexity.

28
Q

In terms of bias and variance, when is the data underfitted?

A

high bias, and low variance

29
Q

In terms of bias and variance, when is the data overfitted?

A

low bias, and high variance

30
Q

What is the main difference between Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC)?

A

AIC penalizes only on model complexity, where BIC also acounts for the size of the data set

31
Q

Which is better AIC or BIC?

A

Neither. The term ‘better’ is always relative to the dataset and the goal of the study. It is always relative and very dependent on domain knowledge.

32
Q

What is cross validation?

A

A modelling technique that is used when there is no seperate test set available for a trained model. It allows us to estimate if an model would still be accurate against new data.

33
Q

How does k-fold cross validation work?

A

We randomly parition the training data into k folds of the same size: the k-th fold is a validation set, the rest is used for training. When all k iterations have been done, the cross validation accuracy of the

34
Q

What is the accuracy of the (k fold) cross validation?

A

The cross validation accuracy is the mean accuracy over all k folds.

35
Q

How can we use cross validation to find an optimal model?

A

By plotting the k-fold cross validation accuracy for one certain value of k, against the complexity of the model. At the minimum, take the top of the standard error and go straight back to decrease complexity without compromising performance.