QM LM10 Simple Linear Regression Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

How do you calculate line of best fit?

A
  • Minimise the total sum of the squares of the distance from the line to the observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does a linear regression assume?

A

A linear relationship between the dependent and independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is SST?

A
  • Total sum of the squares of distances between the observations and any given line (i.e., mean with a slope of 0, line of best fit, and anything in between)
  • SST also equals SSE, which is sum of square of errors
  • This is because we can think of the distance from the given line as error
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the residual error term?

A

The portion of the dependent variable that cannot be explained by the independent variable (when using a line of best fit)

Y = intercept + slope coefficient + error term

These three are regression coefficients
Y is regressed on x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the residual?

A

The error term in the coefficient for a line that describes a correlation.
- Or, the portion of y (dependent variable) that cannot be explained by x (independent variable)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

When does our value of y equal the intercept?

A

When x = 0
However, this only makes sense if the independent variable has meaning at x=0

For example, if we are talking about height versus age, age = 0 has no meaning regardless of what the y value is as someone cannot be 0 in age

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the 4 assumptions behind using a simple linear regression to find a relationship?

A
  1. Linearity: the relationship between x and y is linear (eg rather than curvy)
  2. Homoscedasticity: variance of the error terms is the same for all observations (eg rather than different variances at different times)
  3. Independence: the pairs (x,y) are independent of each other. One xy that we choose should be independent of the next xy
  4. Normality: the error is normalyl distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What would indicate non linearity?

A

All the error terms for low values of x are negative
And all the error terms for high values of x are positive
Below the line -> above the line
Suggests a nonlinear (curvy/polynominal/log/other) relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What would indicate serial correlation?

A

Negative error terms follow negative error terms
Positive error terms follow positive error terms
(a line of plots rather than a more scattered plot around the mean ie +-+-+-)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is coefficient of determination?

A

Measures the fraction of the total variation in the dependent variable that is explained by the independent variable
- It is a goodness of fit measure, but it does not tell us about the significance of the regression equation (which requires factoring in the sample size): it is NOT a statistical test
- Therefore we have to do an F-test: to test total variation over explained variation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

In what case might the relationship between the parameters be linear but the data is not linear?

A

If there is a fixed percentage change from the previous year
- There will be an exponential shape curve
- Here the assumption of linearity is not being violated
- However a linear model may still not be appropriate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When the relationship between parameters is linear but the data is nonlinear, how can you tackle it?

A
  • In this case, linearity has not been violated, but using the raw data, a linear model may not be appropriate
  • When you have this scenario you can either change the model to be non linear, or transform the data to be linear
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What would indicate serial correlation?

A

When the error term of the previous observation can predict the next
- I.e., if the last is positive the next is probably positive
- There is a consisent trend in some regions of error terms being positive for a while or negative for a while
- If there was no serial correlation error terms would randomly appear above and below the regression line across the series of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between SST, SSE and SSR?

A
  • Total sum of squares (SST) is the observation on y, minus the mean, squared, totalled across all observations
  • SST can be broken down into the amount that can be explained, and unexplained
  • Sum of squared errors (SSE) is the unexplained part
  • Regression sum of squares is the explained part
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is one use of a log-linear model?

A

When the growth rate is constant
- Under such a scenario the absolute change will be increasing exponentially
- Therefore we put the dependent variable (y) in a log scale

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is one use of a linear-log model?

A

When y and x are of two very different scales
- I.e., y is percent and x is billions of dollars of revenue
- We use a log around the x on the right hand side of the line equation to transform the scales to be more similar to one another

17
Q

What is a log-log model useful for?

A

When we want to express the relationship between relative change in y and relative change in x
- For example, if we want to find change in revenue for a 10% increase in advertising

18
Q

Homoskedasticity is best described as the situation in which the variance of the residuals of a regression is:
A. Zero
B. Normally distributed
C. Constant across observations

A

C. Constant across observations

19
Q

The figure shown for the standard error of the estimate is the standard deviation of:
A. The dependent variable
B. The residuals from the regression
C. The predicted dependent variable from the regression

A

B. The residuals from the regression

20
Q

How do you calculate sample covariance?

A
  • Find the sum of cross-products of deviations from the mean
  • DIvide by n-1
21
Q

How do you calculate the F-statistic?

A

Mean square of the regression divided by the mean square of the residual

22
Q
A