QM LM10 Simple Linear Regression Flashcards

Question 1

Q

How do you calculate line of best fit?

Answer

A

Minimise the total sum of the squares of the distance from the line to the observations

Question 2

Q

What does a linear regression assume?

Answer

A

A linear relationship between the dependent and independent variables

Question 3

Q

What is SST?

Answer

A

Total sum of the squares of distances between the observations and any given line (i.e., mean with a slope of 0, line of best fit, and anything in between)
SST also equals SSE, which is sum of square of errors
This is because we can think of the distance from the given line as error

Question 4

Q

What is the residual error term?

Answer

A

The portion of the dependent variable that cannot be explained by the independent variable (when using a line of best fit)

Y = intercept + slope coefficient + error term

These three are regression coefficients
Y is regressed on x

Question 5

Q

What is the residual?

Answer

A

The error term in the coefficient for a line that describes a correlation.
- Or, the portion of y (dependent variable) that cannot be explained by x (independent variable)

Question 6

Q

When does our value of y equal the intercept?

Answer

A

When x = 0
However, this only makes sense if the independent variable has meaning at x=0

For example, if we are talking about height versus age, age = 0 has no meaning regardless of what the y value is as someone cannot be 0 in age

Question 7

Q

What are the 4 assumptions behind using a simple linear regression to find a relationship?

Answer

A

Linearity: the relationship between x and y is linear (eg rather than curvy)
Homoscedasticity: variance of the error terms is the same for all observations (eg rather than different variances at different times)
Independence: the pairs (x,y) are independent of each other. One xy that we choose should be independent of the next xy
Normality: the error is normalyl distributed

Question 8

Q

What would indicate non linearity?

Answer

A

All the error terms for low values of x are negative
And all the error terms for high values of x are positive
Below the line -> above the line
Suggests a nonlinear (curvy/polynominal/log/other) relationship

Question 9

Q

What would indicate serial correlation?

Answer

A

Negative error terms follow negative error terms
Positive error terms follow positive error terms
(a line of plots rather than a more scattered plot around the mean ie +-+-+-)

Question 10

Q

What is coefficient of determination?

Answer

A

Measures the fraction of the total variation in the dependent variable that is explained by the independent variable
- It is a goodness of fit measure, but it does not tell us about the significance of the regression equation (which requires factoring in the sample size): it is NOT a statistical test
- Therefore we have to do an F-test: to test total variation over explained variation

Question 11

Q

In what case might the relationship between the parameters be linear but the data is not linear?

Answer

A

If there is a fixed percentage change from the previous year
- There will be an exponential shape curve
- Here the assumption of linearity is not being violated
- However a linear model may still not be appropriate

Question 12

Q

When the relationship between parameters is linear but the data is nonlinear, how can you tackle it?

Answer

A

In this case, linearity has not been violated, but using the raw data, a linear model may not be appropriate
When you have this scenario you can either change the model to be non linear, or transform the data to be linear

Question 13

Q

What would indicate serial correlation?

Answer

A

When the error term of the previous observation can predict the next
- I.e., if the last is positive the next is probably positive
- There is a consisent trend in some regions of error terms being positive for a while or negative for a while
- If there was no serial correlation error terms would randomly appear above and below the regression line across the series of observations

Question 14

Q

What is the difference between SST, SSE and SSR?

Answer

A

Total sum of squares (SST) is the observation on y, minus the mean, squared, totalled across all observations
SST can be broken down into the amount that can be explained, and unexplained
Sum of squared errors (SSE) is the unexplained part
Regression sum of squares is the explained part

Question 15

Q

What is one use of a log-linear model?

Answer

A

When the growth rate is constant
- Under such a scenario the absolute change will be increasing exponentially
- Therefore we put the dependent variable (y) in a log scale

Question 16

Q

What is one use of a linear-log model?

Answer

Study These Flashcards

A

When y and x are of two very different scales
- I.e., y is percent and x is billions of dollars of revenue
- We use a log around the x on the right hand side of the line equation to transform the scales to be more similar to one another

Question 17

Q

What is a log-log model useful for?

Answer

Study These Flashcards

A

When we want to express the relationship between relative change in y and relative change in x
- For example, if we want to find change in revenue for a 10% increase in advertising

Question 18

Q

Homoskedasticity is best described as the situation in which the variance of the residuals of a regression is:
A. Zero
B. Normally distributed
C. Constant across observations

Answer

Study These Flashcards

A

C. Constant across observations

Question 19

Q

The figure shown for the standard error of the estimate is the standard deviation of:
A. The dependent variable
B. The residuals from the regression
C. The predicted dependent variable from the regression