Regression Flashcards
What defines the prediction interval?
Interval for which we have (f.i. 95%) confidence that a new data is observed in this interval.
What defines the confidence interval?
It defines the margins of confidence of the predicted mean of our model, of which we are 95% certain the mean lays in there.
What is the difference with paired and unpaired group means?
With paired, the groups can be compared because they share study parameters. (when researching recovery time; you test the same students on their scores begin and end of the year. ) With unpaired, the two studies seem similar, but have no overlap: (when researching recovery time; different patients are tested on different treatments, in different countries)
What is Analysis of Variance? (ANOVA)
When comparing multiple groups of a study, ANOVA is a tool that will tell us how much variance is explained for (and) by which factors of the response.
What value of ANOVA indicates a high influence of a factor on the response?
In the ANOVA table, a high Sum.Sq. value and F-value will indicate this.
When is ANOVA usefull?
When investigating the influence magnitude of factors in a multi-variable prediction model.
In model selection, what is forward selection?
First, fit the null model containg only the intercept.
The fit p seperate models by adding each of the predictors individually.
Keep the model with the lowest RSS (or highest R2).
Repeat until some requirement is met.
In model selection, what is backward selection?
Fit the maximal model with all p predictors.
Remove the predictors that meet a certain requirement. (f.i. that have a p-value higher than the significance level)
Fit the new (reduced) model and continue until some model condition is met.
What is the generalized linear model (GLM)?
It is a model type that does not asume the response to be Gaussian. Ordinal (aka categorial) responses are f.i. not gaussian.
What two components does the GLM introduce and how are they projected in ‘formula form’?
The link function g(Y) and Distribution D.
How does the link function look and to what domain does it map itself?
it maps the response domain [0, 1] to a domain to [-inf, inf]
What is the link space?
In Logistic Regression (so in GLM) it is the [-inf, inf] domain space.
What is the response space?
In Logistic Regression (so in GLM) it is the [0, 1] domain space.
What is the Logistic Model?
It is the derived from the GLM model where g(Y) = logit(Y). So basically a log-transformatino on the response of a (generalised) linear model.
In logistic regression, in what space do we estimate our parameters?
We build a linear model in the link space, such that we can transform the logit function in the link space back into the response space.