L3: Linear Regression Flashcards

Question 1

Q

Is there a relationship between any of the advertising streams and sales?

Answer

A

It appears that TV and Radio look promising as having some relationship. This is shown by the linear/non-linear pattern.

Newspaper may have some weak relationship or may require a data transformation.

Question 2

Q

How strong is the relationship between the different advertising streams and sales?

Answer

A

TV has the strongest relationship, followed by radio and then newspaper.

Question 3

Q

Which of the media contributes to sales?

Answer

A

From first glance, it appears that TV and radio only contribute to sales

Question 4

Q

Is the relationship linear between the advertising streams and sales?

Answer

A

Perhaps the radio and tv are linear, however it is possible that the relationship tapers, so the relationship could be logarithmic for TV and perhaps something similar for radio

Question 5

Q

If there were synergy between two variables, what would this mean?

Answer

A

It would suggest that there is an interaction between two variables that aids in the explanation of the dependent variable’s variability.

Question 6

Q

What are the assumptions that are made in a linear regression model? (3)

Answer

A

That the response variable, Y has a linear relationship to the predictor variable, X

That the errors are independent and normally distributed

That there is constant variability in the residuals

Linearity, Nearly Normal Residuals, Constant Variability

Question 7

Q

Define the i th residual by its equation.

Answer

A

Let e_i be the residual of datapoint i:

e_i = y_i - ŷ_i

_{That is, the residual is the difference between the true and the predicted value of y}

Question 8

Q

Define the residual sum of squares then

Answer

A

The residual sum of the squares is a means of measuring the discrepancy between the predicted and true values of the dependent variable.

RSS = ⁿ_i=1∑e²_i

Where e_i is the residual of the ith data point

Question 9

Q

What is the least squares approach?

Answer

A

The least squares approach is choosing the coefficients of the linear model by minimising the RSS. In such a way we optimise the model so that the model has the least deviation from the data points.

This will yield the most-true model for the data.

Question 10

Q

Which of the variables are significant?

What does this mean?

Answer

A

The Pr(>|t|) value gives the probability of the t-test, if this is <0.05 then we can reject the null hypothesis and assume a relationship.

In this case, the intercept and TV variable appear to be significantly related to sales.

Question 11

Q

What does the Std. Error indicate?

Answer

A

The Std. Error indicates how precisely the model estimates the coefficient’s unknown (error) value.

SE(B₀) = 0.457843: in the absence of any advertising, the average sales can vary by 457.843 units.

SE(B₁) = 0.002691: for each $1,000 increase in television advertising, the average increase in sales can vary by 2.691 units.

Question 12

Q

What is the 95% confidence interval of the B₁coefficient?

Answer

A

The 95% confidence interval is found by

B₁ ± 2 SE(B₁)

Therefore the interval is:

[B₁-2SE(B₁), B₁+2SE(B₁)]

Question 13

Q

What is the Residual Standard Error (RSE)?

Answer

A

It is a measure of the quality of linear regression fit.

In our previous example, the RSE = 3.259 therefore actual sales in each market deviates from the true regression line by 3259 units on average. This is 23% (3259/14000) of the mean value (14,000) of the sales.

Question 14

Q

What does the R²tell us?

Answer

A

The R squared tells us the proportion of variability in Y that can be explained by the independent variable X.

Question 15

Q

For multiple linear regression, how can we find the best estimates for the regression coefficients?

Answer

A

We can use the RSS, just like in linear regression

Question 16

Q

F-statistics

If there is an F-value that is close to 1, what can we assume?

Answer

A

Then there is no relationship between the Y and its predictors

Question 17

Q

F-statistics

If there is an F-value that is greater than 1, what can we deduce?

Answer

A

That there exists a relationship between the predictor and the response variables

Question 18

Q

If we had a small n, what kind of F-statistic would be required to have strong evidence against the null hypothesis?

Answer

A

We would need a large F-value to show any relationship between the Xi and Y

Question 19

Q

If we had a large n, how might this affect our need of a large/small F-statistic?

Answer

A

We would be alright with a lesser F-statistic as the n will reduce the denominator in the F-statistic equation

Question 20

Q

What is the purpose of the anova?

Answer

A

The anova is a test in variance to analyse the difference in means between groups.

It generalises the t-test beyond two means - in such a way we can see if two groups of data differ by statistical chance

Question 21

Q

When creating a linear regression model with qualitative factors, what may be necessary?

Answer

A

Dummy variables, where we create numerical levels to represent the categories/qualitative factors.

E.g. if there are K factors, we will have k-1 variables, each with two levels.

Question 22

Q

The hierarchy principle in linear regression states what?

Answer

A

That if we include an interaction in a model, we should also include the main effects, even if the p-value associated with their coefficient alone is not significant.

L3: Linear Regression Flashcards

After this week: - Understand how regression analysis works - Apply linear models to solving different regression problems - Critically assess the accuracy of coefficient estimates and the accuracy of the model - Produce a precise analysis of the model output