Model Selection Flashcards

1
Q

What is a fixed factor?

A

An explanatory variable where the level of the explanatory variable is meaningful.

If we wish to draw inferences about the effects of that particular level of the explanatory variable on the response variable we can

The factor is completely repeatable at all levels of the explanatory variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a random factor?

A

An explanatory variable where the level of the explanatory variable is not meaningful.

E.g fishes in a population

Not exactly repeatable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What do linear models assume?

A

Main factors impact the outcome in a predictable way and all other variation is due to error.

This assumes independence of errors (errors are distributed independently throught the data set).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do linear models assume?

A

Main factors impact the outcome in a predictable way and all other variation is due to error.

This assumes independence of errors (errors are distributed independently throught the data set).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is independence of errors violated?

A

1) If you have repeated measurements from different biological subjects the effect of random differences between these subjects will not be distributed independently throughout the data set.
2) if the experimental design is nested random differences at higher levels of nesting will not be distributed independently throughout the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is independence of errors?

A

Errors are distributed independently throughout the data set

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why is taking into account of nesting important?

A

If a biological individual was nested within a group. The randomness of that individual may skew results of observations from that group if that randomness isn’t accounted for in the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are mixed models?

A

Models that allow us to include both random and fixed explanatory variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Why are mixed models useful?

A

They allows us to fit models which accurately account for different sources of variation in the data set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is used to determine the importance of different factors in mixed models?

A

Likelihood ratio test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the likelihood of a model?

A

The probability of observing our data given the model.
These tell us if models are different from one another.

(Useful when you compare likelihoods between models)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a better likelihood score 15 or 20?

A

20

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How do would you know if removal of a explanatory variable from the mixed model has an effect and the explanatory variable is important?

A

If p-value of likelihood ratio test (comparison of original model and model with removed variable) is <= 0.05 then the explanatory variable you removed is important.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a random intercepts model?

A

A model that assumes intercepts account for random differences between a variable and the slope is constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a random slopes and intercepts model?

A

A model where Random effects from person to person are captured by gradients as well as intercepts.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is feature selection?

A

Selection of the most relevant explanatory factors (attributes)

17
Q

What are some advantages of feature selection?

A

1) shorter training times for algorithms
2) over fitting is less likely
3) simpler models are easier to implement
4) can be used to sift through datasets with l000s of attributes e.g microarray

18
Q

What are the methods of feature selection?

A

1) Filtering methods
2) wrapper methods
3) embedded methods

19
Q

What are filtering methods of feature selection?

A

This is selecting the most interesting features by conducting hypothesis tests on each feature.

For each feature the null hypothesis “this feature does not explain a significant portion of the variation in the response variable” is tested.

20
Q

Do you need to use a correction for filtering by hypothesis testing?

What correlations can you apply?

A

Yes

Bonferroni correction

False discovery rate

21
Q

What does the Bonferroni correction control?

A

Controls probability of making at least one type one error

Makes sure the probability of making a Type 1 error is <=alpha
(Stringent).

22
Q

What does the false discovery rate control?

A

Controls the overall proportion of type one errors made

You decide a pre defined threshold of acceptable type 1 errors (q)

P-values bellow q indicate significance
(Less stringent)

23
Q

What are the strengths and weaknesses of filtering method?

A

Strengths: 1) computationally easy
2) fast to run

Weaknesses: 1) takes no account of interactions between explanatory variables

                      2) will select correlated features (that explain the same thing therefore you only need one)
                      3) confounding variation that isn't explained away may cause important features to be overlooked
24
Q

What are wrapper methods for feature selection?

A

Methods that consider more than one feature at once.

Different subsets of features are tested to determine the collection of features which produce the best model of the data.

25
Q

What are the different wrapper selection methods?

A

1) Stepwise regression

2) Recursive feature elimination

26
Q

What is stepwise regression?

A

Stepwise increasing (forward) or decreasing (backwards) number of parameters in the model and comparing BIC/AIC.

27
Q

What is BIC and AIC?

A

These are used to determine when the model is being improved or not. By seeking to find the model which strikes the right balance between fitting and over fitting the data.

Useful for wrapper method as you can see if the model is being improved or not by adding/ subtracting features.

28
Q

What are the strengths and weaknesses of wrapper methods?

A

Strengths: 1) The explanatory power of some features might only be revealed once other features are accounted for.
2) Possible to allow for interactions

Weaknesses: 1) Higher computational time
2) More risk of over fitting

29
Q

Is a BIC of 20 better than 10?

A

No

Lower BIC/AIC values indicate better models

30
Q

What are Embedded methods of feature selection?

A

Feature selection is embedded in model construction process.

The algorithm decides itself which features are more important than others.