L3: Linear Regression Flashcards
After this week: - Understand how regression analysis works - Apply linear models to solving different regression problems - Critically assess the accuracy of coefficient estimates and the accuracy of the model - Produce a precise analysis of the model output
Is there a relationship between any of the advertising streams and sales?
It appears that TV and Radio look promising as having some relationship. This is shown by the linear/non-linear pattern.
Newspaper may have some weak relationship or may require a data transformation.
How strong is the relationship between the different advertising streams and sales?
TV has the strongest relationship, followed by radio and then newspaper.
Which of the media contributes to sales?
From first glance, it appears that TV and radio only contribute to sales
Is the relationship linear between the advertising streams and sales?
Perhaps the radio and tv are linear, however it is possible that the relationship tapers, so the relationship could be logarithmic for TV and perhaps something similar for radio
If there were synergy between two variables, what would this mean?
It would suggest that there is an interaction between two variables that aids in the explanation of the dependent variable’s variability.
What are the assumptions that are made in a linear regression model? (3)
That the response variable, Y has a linear relationship to the predictor variable, X
That the errors are independent and normally distributed
That there is constant variability in the residuals
Linearity, Nearly Normal Residuals, Constant Variability
Define the i th residual by its equation.
Let ei be the residual of datapoint i:
ei = yi - ŷi
That is, the residual is the difference between the true and the predicted value of y
Define the residual sum of squares then
The residual sum of the squares is a means of measuring the discrepancy between the predicted and true values of the dependent variable.
RSS = ni=1∑e2i
Where ei is the residual of the ith data point
What is the least squares approach?
The least squares approach is choosing the coefficients of the linear model by minimising the RSS. In such a way we optimise the model so that the model has the least deviation from the data points.
This will yield the most-true model for the data.
Which of the variables are significant?
What does this mean?
The Pr(>|t|) value gives the probability of the t-test, if this is <0.05 then we can reject the null hypothesis and assume a relationship.
In this case, the intercept and TV variable appear to be significantly related to sales.
What does the Std. Error indicate?
The Std. Error indicates how precisely the model estimates the coefficient’s unknown (error) value.
SE(B0) = 0.457843: in the absence of any advertising, the average sales can vary by 457.843 units.
SE(B1) = 0.002691: for each $1,000 increase in television advertising, the average increase in sales can vary by 2.691 units.
What is the 95% confidence interval of the B1 coefficient?
The 95% confidence interval is found by
B1 ± 2 SE(B1)
Therefore the interval is:
[B1-2SE(B1), B1+2SE(B1)]
What is the Residual Standard Error (RSE)?
It is a measure of the quality of linear regression fit.
In our previous example, the RSE = 3.259 therefore actual sales in each market deviates from the true regression line by 3259 units on average. This is 23% (3259/14000) of the mean value (14,000) of the sales.
What does the R2 tell us?
The R squared tells us the proportion of variability in Y that can be explained by the independent variable X.
For multiple linear regression, how can we find the best estimates for the regression coefficients?
We can use the RSS, just like in linear regression
F-statistics
If there is an F-value that is close to 1, what can we assume?
Then there is no relationship between the Y and its predictors
F-statistics
If there is an F-value that is greater than 1, what can we deduce?
That there exists a relationship between the predictor and the response variables
If we had a small n, what kind of F-statistic would be required to have strong evidence against the null hypothesis?
We would need a large F-value to show any relationship between the Xi and Y
If we had a large n, how might this affect our need of a large/small F-statistic?
We would be alright with a lesser F-statistic as the n will reduce the denominator in the F-statistic equation
What is the purpose of the anova?
The anova is a test in variance to analyse the difference in means between groups.
It generalises the t-test beyond two means - in such a way we can see if two groups of data differ by statistical chance
When creating a linear regression model with qualitative factors, what may be necessary?
Dummy variables, where we create numerical levels to represent the categories/qualitative factors.
E.g. if there are K factors, we will have k-1 variables, each with two levels.
The hierarchy principle in linear regression states what?
That if we include an interaction in a model, we should also include the main effects, even if the p-value associated with their coefficient alone is not significant.