Part I: Regressions Flashcards

1
Q

What is correlation?

A

Measures the strength and direction of a statistical relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is covariance?

A

If high values for one variable means high values for the other variable and the same holds for the lower values, the covariance would be positive. If the opposite, it would be negative. It is scale dependent meaning that it’s harder to interpret than the correlation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is ordinary least squares (OLS)?

A

Fitting a line based on the minimum space from points to line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is linear regression?

A

A regression model where we assume the relationship between the response and predictor is linear. The response variable is continuous.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is logistic regression?

A

A regression model where the response variable is binary and we assume the relationship between the response and predictor is logistic (S curve).
The response of therefore between 0 and 1, and we can predict a probability as the response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When is logistic regression used over linear?

A

If the response variable is binary (meaning we want to predict a classification)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is AIC and BIC and how to use them

A

They are ways to compare different models. Lower scores = better models

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a regression?

A

Way to understand the relationship between a dependent variable y (also known as the response or outcome variable) and one or more independent variables x (also known as the predicters or explanatory variables). The dependent variable must be continuous (it can take any value within a range). The independent variable can be continuous, discrete (countable number of specific values) or categorical. Regressions usually use the least squares method where we find which linear line fits the data best (with minimal space between points and the line aka least squares).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the assumptions for linear regressions?

A
  • The errors should follow the normal distribution
  • The errors should be independent (meaning that we cannot have correlation between the residuals/erors)
  • Homoscedastity (meanins that we should have equal variance over the predictions. If not we can have negative effects on the confidence intervals)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do we measure correlation in linear regressions?

A

Pearson correlations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How can we estimate linear regression variables?

A

Maximum likelihood estimation (MLE and REML) OR ordinary least squares (OLS).
IMPORTANT: both gives the same results, but MLE is faster and requires less computational power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Maximum likelihood estimation?

A

Model that finds the variables that make the observation most likely. Statistical inference that uses probabilistic data generating models to estimate parameters for models. The idea is to make a graph that shows the distribution of how likely we are to see observations in this location. Therefore instead of having a model based on the means of the data, we now have a model that is based on the likelihood of seeing observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the main drawback from using MLE?

A

The problem here is that it does not take into account the decrease in degrees of freedom that comes along with estimating the mean.
Therefore the estimate will be biased.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between MLE and REML? Why can we not compare REML?

A

REML: Good for multilevel models.
This basically corrects the bias in MLE by correcting for the degrees of freedom lost in estimating fixed effects.
REML is specifically designed to provide more accurate estimates of variance components (random effects) by adjusting for the degrees of freedom used by the fixed effects.
We can however not compare REML results because REML uses a subset of data that can be different for each run/model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is SSTO, SSR, SSE and SSTO? How can they be used to calculate R^2

A

Variation measures.
SSR (Regression Sum of Squares): Represents the variation that is explained by the regression line (the fitted values $\hat{Y}_i$​). It is the sum of the squares of the differences between each predicted $\hat{Y}_i$​ and the overall mean $\bar{Y}$.

SSE (Error Sum of Squares): Represents the unexplained variation or the variation that is due to random error. It is the sum of the squares of the differences between each observed $Y_i$ and the predicted $\hat{Y}_i$.

SSTO (Total Sum of Squares): SSR+SSE. Represents the total variation in the observed variable Y. It is the sum of the squares of the differences between each observed $Y_i$​ and the overall mean $\bar{Y}$

Coefficient of determination:
R^2 = SSR/SSTO
This coefficient measures the proportion of the total variation in the response that can be explained by the linear regression in range [0:1].

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why perform logit transformations?

A

If we wish to use a logistic regression we need binary outcomes. If we do not transform the data, we can get negative values.
Therefore we need to convert probabilities into log odds.

17
Q

How do we estimate logistic regressions?

A

MLE. We have no residuals so it’s impossible to estimate OLS.

18
Q

What is the principle of marginality?

A

Choosing the simpler model if a more complex model will not add enough value to justify the addition

19
Q

How do we interpret odds ratios?
(Xresponse|predictor1)/(Xresponse|predictor2) = y

A

The odds of x given predictor 1 are y times larger than the odds of x given predictor 2.

20
Q

What does the properties of odds ratio mean? OR=1, OR>1, OR<1

A

OR=1 for independence
OR>1 = positive association
OR<1 = negative association

21
Q

How do we transform back from odds ratio?

A

e^odds/1+e^odds

22
Q

What does R^2 mean?

A

The percentage of variability explained by the model