Regression Flashcards

1
Q

Regression

A

A model - a straight line – that predicts every value of a
dependent variable (outcome) given any value of the independent variable(s; predictors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

R^2 (R-squared)

A

Measure of how much of the total variance is accounted for by the regression model.
R-squared = 1 - SSm/SSr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Adjusted R-squared

A

Adjusted R-squared penalizes models for number of predictors, computed as:
1 - (((1 - R^2)*(N - 1)) / N - k - 1)
…where N = number of data points, k = number of predictors.
Considered a conservative alternative to R-squared.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Model Sum of Squares = SSM

A

Difference between linear model and the mean of the data points

Deviation of each data point predicted by the model from the mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Residual Sum of Squares = SSR

A

Deviation between each data point predicted by the model and the actual data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Total Sum of Squares = SST

A

Data points deviation from the mean (error in the null model)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

F-statistic

A

Measure of whether improvement of the model relative to the mean is greater than residual error
F = SSm/SSr

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is multiple regression? (with continuous predictors)

A

A model that predicts some (continuous)
outcome variable, y, from multiple continuous predictors, x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How can I compute multiple regression with continuous variables?

A

To compute multiple regression with continuous variables, you’ll fit a regression model where you predict one continuous outcome (dependent variable) using two or more continuous predictors (independent variables).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Null model = intercept only model

A

It only has an intercept which is the mean of the data points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Assumptions of linear regression

A

(1) Outcome variable must be continuous (at least at the interval level)

(2) No multicollinearity (i.e., no linear relationship between 2 or more predictors)

(3) Linearity of residuals (i.e., linear relationship between predicted values & residuals)

(4) Normality of residuals (residuals are random and normally distributed with mean 0)

(5) Homoscedasticity (variance of residuals is the same for all data points)

(6) No influential cases (outliers)

(7) Independence of residuals / observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

A residual

A

Residual = Observed data point – predicted data point

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Multicollinearity

A

Multicollinearity occurs in regression analysis when two or more independent variables are highly correlated with each other. This means they contain overlapping information about the dependent variable, which makes it difficult for the model to estimate the unique effect of each predictor accurately.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Homoscedasticity

A

In regression, homoscedasticity refers to the idea that the variance of the residuals (differences between observed and predicted values) should be consistent across all levels of the independent variable(s)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Heteroscedasticity

A

Heteroscedasticity in regression occurs when the variability of the residuals (the differences between observed and predicted values) is not constant across all levels of the independent variable(s). In other words, the spread or “scatter” of residuals changes as the values of the predictor variable(s) change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ANOVA

A

ANOVA in regression is used to test if the model as a whole significantly predicts the dependent variable. It tells you whether the independent variables collectively improve the model fit beyond what would be expected by chance.

17
Q

AIC and BIC

A

They help determine which model best balances goodness of fit with model complexity.
Both AIC and BIC are used to compare models, and lower values for each indicate a better model fit. However, AIC is generally more lenient about adding parameters, while BIC favors simpler models, especially as the sample size grows.

18
Q

A categorical variable

A

Variable with discrete levels, e.g.:
* Political party (liberal / conservative)
* Country of origin (Denmark / Sweden / Finland, etc.)
* Calendar season(Autumn / Winter / Spring / Summer)