Regression Flashcards

Question 1

Q

What defines the prediction interval?

Answer

A

Interval for which we have (f.i. 95%) confidence that a new data is observed in this interval.

Question 2

Q

What defines the confidence interval?

Answer

A

It defines the margins of confidence of the predicted mean of our model, of which we are 95% certain the mean lays in there.

Question 3

Q

What is the difference with paired and unpaired group means?

Answer

A

With paired, the groups can be compared because they share study parameters. (when researching recovery time; you test the same students on their scores begin and end of the year. ) With unpaired, the two studies seem similar, but have no overlap: (when researching recovery time; different patients are tested on different treatments, in different countries)

Question 4

Q

What is Analysis of Variance? (ANOVA)

Answer

A

When comparing multiple groups of a study, ANOVA is a tool that will tell us how much variance is explained for (and) by which factors of the response.

Question 5

Q

What value of ANOVA indicates a high influence of a factor on the response?

Answer

A

In the ANOVA table, a high Sum.Sq. value and F-value will indicate this.

Question 6

Q

When is ANOVA usefull?

Answer

A

When investigating the influence magnitude of factors in a multi-variable prediction model.

Question 7

Q

In model selection, what is forward selection?

Answer

A

First, fit the null model containg only the intercept.

The fit p seperate models by adding each of the predictors individually.

Keep the model with the lowest RSS (or highest R²).

Repeat until some requirement is met.

Question 8

Q

In model selection, what is backward selection?

Answer

A

Fit the maximal model with all p predictors.

Remove the predictors that meet a certain requirement. (f.i. that have a p-value higher than the significance level)

Fit the new (reduced) model and continue until some model condition is met.

Question 9

Q

What is the generalized linear model (GLM)?

Answer

A

It is a model type that does not asume the response to be Gaussian. Ordinal (aka categorial) responses are f.i. not gaussian.

Question 10

Q

What two components does the GLM introduce and how are they projected in ‘formula form’?

Answer

A

The link function g(Y) and Distribution D.

Question 11

Q

How does the link function look and to what domain does it map itself?

Answer

A

it maps the response domain [0, 1] to a domain to [-inf, inf]

Question 12

Q

What is the link space?

Answer

A

In Logistic Regression (so in GLM) it is the [-inf, inf] domain space.

Question 13

Q

What is the response space?

Answer

A

In Logistic Regression (so in GLM) it is the [0, 1] domain space.

Question 14

Q

What is the Logistic Model?

Answer

A

It is the derived from the GLM model where g(Y) = logit(Y). So basically a log-transformatino on the response of a (generalised) linear model.

Question 15

Q

In logistic regression, in what space do we estimate our parameters?

Answer

A

We build a linear model in the link space, such that we can transform the logit function in the link space back into the response space.

Question 16

Q

What are Generalized Additive Models?

Answer

Study These Flashcards

A

They are similar to GLM’s, apart from the fact that each factor is now also modeled with nonlinearities. So each factor is asumed as a function of that factor.

Regression Flashcards

(16 cards)