SDA: Statistical Modelling Flashcards

1
Q

What is a Bi-variate Regression Model?

A

Used to quantify the linear relationship between two variables using a regression line/equation

“Numerical representation of the fixed relationship between an observes response variable and a number of explanatory variables together with a measure of uncertainty associated with the relationship”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the Regression Line?

A

A line showing what relationship should have been expected i.e. ‘line of best fit’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the Ordinary Least Squares approach to regression modelling?

A

Most frequent approach
Fits a linear trend line through a cloud of points
Minimises the deviations between observed values and predicted i.e. minimises number of residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are residuals?

A

The disturbances, unsystematic variations response i.e. the distance from an individual point to the regression line
Above the line: Positive residuals
Below the line: Negative residuals
If residuals are too big to have occurred by chance then the chosen model is inadaquate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does the coefficient of determination (r-squared) value show?

A

Squared PPMCC
Summarises residuals around the regression i.e. shows the % of the data that is closest to the line of best fit
Shows how well the regression line fits the data

Example:
PPMCC = 0.922
Therefore, r-squared = 0.850
Therefore, 85% of the variation in y can be explained by the linear relationship between x and y, 15% of the variation remains unexplained

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the F-test?

A

Tests r-squared for significance directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 6 assumptions of OLS regression modelling?

A
  1. Residuals have a mean value of zero
  2. Error terms patternless and uncorrelated
  3. Error terms will have a constant and equal variance i.e. HOMOSCEDASTICITY
  4. No correlation between error terms and x-variables
  5. Linearity - must be assumed that the relationship will be linear
  6. Residuals are normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is homoscedasticity?

A

Error terms will have an equal and constant variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a Confirmatory approach to regression?

A

Significance tests (t and f) inferring from sample to population
Guarding against sampling error
Presumes well developed model
Necessitates demanding assumptions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is an Exploratory approach to regression?

A

Graphs of residuals
Exemplifies problems with data and models
Trying to develop initial models into an improved/better model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are Catch-All plots?

A

Formed by plotting standardised residuals against fitted values (thus it is a separate graph in itself)
Informal way of magnify defects and model ills
Can be used in multiple regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is Multiple-Regression modelling?

A

Models that include more than one independent (x) variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

As well as the assumptions for a bi-variate regression model, what is the extra assumption for a multi-variate regression model?

A

No MULTICOLLINEARITY i.e. there should be NO correlation between any independent (x) variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

When are t-test results usually insignificant?

A

When there is heteroscedasticity and multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is Heteroscedasticity?

A

Where the variability of a variable is unequal across the range of values across a second variable that predicts it

e.g. annual income is heteroscedastic when predicted by age because most teens have very similar income levels, whilst older generations have much more variations. Therefore, across age ranges the variance in annual income changes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly