Regression and Multiple Regression Flashcards

1
Q

Regression

A

Regression estimates the relationship between one variable and a number of others. It is used for both description and prescription

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Steps

A

Scatterplots and Correlation analysis: feel for the direction and strength of the relationship

Model estimation: software

Diagnostic evaluation: evaluate validity and usefulness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Simple Linear Regression

A

Tests relationships between 2 variables in a linear model
y = a + b x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Residual Scatter

A

Residual (e): vertical distance of a point from a line. The difference between observed and predicted values of dependent variable.
Choose a line so that residual scatter is minimised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

R-Squared

A

In simple terms, R-squared tells you how well the independent variables in your model explain the variation in the dependent variable. However, it doesn’t tell you whether the coefficient estimates and predictions are biased, or whether the model is a good fit for the underlying data.

=1-s/sy

is a measure of how well the line fits the data
Standard error

The adjusted R squared penalises for low number of observations and high number of explanatory variables

S: the residual sum of squares (the sum of squared differences between the actual outcomes and the predicted outcomes)

Sy: represents the total sum of squares (the sum of squared differences between the actual outcomes and their mean).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Residuals

A

residuals should be simple randomness remaining after deterministic part of variation in y has been modelled.

A non-random pattern would indicate there is something the model is not capturing that is leaking into the residuals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Testing Significance of variables

A

|t-stat| > 2 (1.96), reject the null hypothesis

t-stat = coefficient estimate/st.error

(p-val):
The probability of finding the observed relationship if the null hypothesis was true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Good model

A

All coefficients: t-stat > 2, p-val<0.05

High adjusted R2

Satisfactory residuals

Equation makes sense

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Multiple Regression

A

a is the baseline
b is the increase in y resulting for a unit rise in x
Multicollinearity: A high degree of correlation between 2 or more explanatory variables
Cost is highly correlated with 1/Capacity and Years but these two are highly correlated with each other.
Including Years adds little information, hence model fit is no better

How well did you know this?
1
Not at all
2
3
4
5
Perfectly