Model Selection Flashcards

Question 1

Q

What is a fixed factor?

Answer

A

An explanatory variable where the level of the explanatory variable is meaningful.

If we wish to draw inferences about the effects of that particular level of the explanatory variable on the response variable we can

The factor is completely repeatable at all levels of the explanatory variable

Question 2

Q

What is a random factor?

Answer

A

An explanatory variable where the level of the explanatory variable is not meaningful.

E.g fishes in a population

Not exactly repeatable

Question 3

Q

What do linear models assume?

Answer

A

Main factors impact the outcome in a predictable way and all other variation is due to error.

This assumes independence of errors (errors are distributed independently throught the data set).

Question 4

Q

What do linear models assume?

Answer

A

Main factors impact the outcome in a predictable way and all other variation is due to error.

This assumes independence of errors (errors are distributed independently throught the data set).

Question 5

Q

When is independence of errors violated?

Answer

A

1) If you have repeated measurements from different biological subjects the effect of random differences between these subjects will not be distributed independently throughout the data set.
2) if the experimental design is nested random differences at higher levels of nesting will not be distributed independently throughout the data set.

Question 6

Q

What is independence of errors?

Answer

A

Errors are distributed independently throughout the data set

Question 7

Q

Why is taking into account of nesting important?

Answer

A

If a biological individual was nested within a group. The randomness of that individual may skew results of observations from that group if that randomness isn’t accounted for in the model.

Question 8

Q

What are mixed models?

Answer

A

Models that allow us to include both random and fixed explanatory variables.

Question 9

Q

Why are mixed models useful?

Answer

A

They allows us to fit models which accurately account for different sources of variation in the data set.

Question 10

Q

What is used to determine the importance of different factors in mixed models?

Answer

A

Likelihood ratio test

Question 11

Q

What is the likelihood of a model?

Answer

A

The probability of observing our data given the model.
These tell us if models are different from one another.

(Useful when you compare likelihoods between models)

Question 12

Q

What is a better likelihood score 15 or 20?

Question 13

Q

How do would you know if removal of a explanatory variable from the mixed model has an effect and the explanatory variable is important?

Answer

A

If p-value of likelihood ratio test (comparison of original model and model with removed variable) is <= 0.05 then the explanatory variable you removed is important.

Question 14

Q

What is a random intercepts model?

Answer

A

A model that assumes intercepts account for random differences between a variable and the slope is constant.

Question 15

Q

What is a random slopes and intercepts model?

Answer

A

A model where Random effects from person to person are captured by gradients as well as intercepts.

Question 16

Q

What is feature selection?

Answer

A

Selection of the most relevant explanatory factors (attributes)

Question 17

Q

What are some advantages of feature selection?

Answer

A

1) shorter training times for algorithms
2) over fitting is less likely
3) simpler models are easier to implement
4) can be used to sift through datasets with l000s of attributes e.g microarray

Question 18

Q

What are the methods of feature selection?

Answer

A

1) Filtering methods
2) wrapper methods
3) embedded methods

Question 19

Q

What are filtering methods of feature selection?

Answer

A

This is selecting the most interesting features by conducting hypothesis tests on each feature.

For each feature the null hypothesis “this feature does not explain a significant portion of the variation in the response variable” is tested.

Question 20

Q

Do you need to use a correction for filtering by hypothesis testing?

What correlations can you apply?

Answer

A

Yes

Bonferroni correction

False discovery rate

Question 21

Q

What does the Bonferroni correction control?

Answer

A

Controls probability of making at least one type one error

Makes sure the probability of making a Type 1 error is <=alpha
(Stringent).

Question 22

Q

What does the false discovery rate control?

Answer

A

Controls the overall proportion of type one errors made

You decide a pre defined threshold of acceptable type 1 errors (q)

P-values bellow q indicate significance
(Less stringent)

Question 23

Q

What are the strengths and weaknesses of filtering method?

Answer

A

Strengths: 1) computationally easy
2) fast to run

Weaknesses: 1) takes no account of interactions between explanatory variables

                      2) will select correlated features (that explain the same thing therefore you only need one)
                      3) confounding variation that isn't explained away may cause important features to be overlooked

Question 24

Q

What are wrapper methods for feature selection?

Answer

A

Methods that consider more than one feature at once.

Different subsets of features are tested to determine the collection of features which produce the best model of the data.

Question 25

Q

What are the different wrapper selection methods?

Answer

A

1) Stepwise regression

2) Recursive feature elimination

Question 26

Q

What is stepwise regression?

Answer

A

Stepwise increasing (forward) or decreasing (backwards) number of parameters in the model and comparing BIC/AIC.

Question 27

Q

What is BIC and AIC?

Answer

A

These are used to determine when the model is being improved or not. By seeking to find the model which strikes the right balance between fitting and over fitting the data.

Useful for wrapper method as you can see if the model is being improved or not by adding/ subtracting features.

Question 28

Q

What are the strengths and weaknesses of wrapper methods?

Answer

A

Strengths: 1) The explanatory power of some features might only be revealed once other features are accounted for.
2) Possible to allow for interactions

Weaknesses: 1) Higher computational time
2) More risk of over fitting

Question 29

Q

Is a BIC of 20 better than 10?

Answer

A

No

Lower BIC/AIC values indicate better models

Question 30

Q

What are Embedded methods of feature selection?

Answer

A

Feature selection is embedded in model construction process.

The algorithm decides itself which features are more important than others.