Part 7. Intro to Linear Regression Flashcards
Regression analysis
A tool for examining whether a variable is useful for explaining another variable.
i.e. whether earnings growth/cash flow growth helps explain the company’s value in marketplace.
Sum of squares total (SST)
- A simpler exploration to try understand why each company’s ROA differs from mean ROA of 12.5%, we look at the sum of squared deviations of obs from mean to capture variations in return on assets (ROA) from their mean.
Simple linear regression (SLR)
Method for relating dependent and independent variables through estimation of a relationship, where we have one independent variable.
Multiple regression
Method for relating dependent and independent variables through estimation of a relationship, where we have more than one independent variable.
Ordinary least squares (OLS) regression
The goal is to fit a line to observations on Y and X to minimise the squared deviations from line; using least squares criterion.
Line of best fit
In simple linear regression, the estimate intercept, b0^, and slope b1^ are such that the sum of squared vertical distances from the observations to fitted line is minimised.
Residual for ith observation, ei
This is how much the observed value of Yi differs from the Yi^ estimated using the regression line: ei = Yi - Yi^.
This refers to the true underlying population relationship, whereas the residual refers to fitted linear relation based on sample.
Residuals
Represented by the vertical distances from the fitted line, therefore in the units of measurement represented by the dependent variable.
i.e. if dependent variable is in euros, the error term is in euros etc.
Sample correlation, r
The ratio of the covariance to the product of the standard deviations.
Slope
The change in dependent variable for one unit change in independent variable.
Cross sectional regression
This involves many observations of X and Y for same time period, depending on regression model these observations could come from different companies, asset classes, investment funds etc.
Time series
Use many observations from different time periods for the same company, asset class, investment fund, country or other entity depending on regression model.
i.e. monthly data from many years to test whether country’s inflation rate determine short term interest rates.
Assumptions of simple linear regression model:
- Linearity - the relationship between the dependent variable Y, and independent variable X is linear.
- Homoscedasticity - the variance of regression residuals is the same for all observations.
- Independence - the observations, pairs of Ys and Xs are independent of one another, implies regression residuals are uncorrelated across observations.
- Normality - the regression residuals are normally distributed.
Homoskedasticity
The variance of the residuals is the same for all observations.
Heteroskedasticity
If residuals are not homoscedastic, if the variance of residuals differs across observations.
Sum of squares regression (SSR):
The sum of the squared differences between the predicted value of the dependent variable, Yi^, based on the estimated regression line and mean of dependent variable Y-.
Coefficient of determination (R^2):
The percentage of the variation of the dependent variable that is explained by the independent variable.
- measure used to evaluate goodness of fit.
Standard error of estimate (se):
The absolute measure of distance between the observed values of the dependent variable, and those predicted from estimated regression.
The smaller the se, the better fit of the model.
Measure of goodness fit of estimated regression:
- Standard error estimate (se)
2. F-statistic
Standard error of slope coefficient (sb1)
In simple linear regression, this is the ratio of the models standard error of the estimate (se) to the square root of the variation of the independent variable.
Indicator/dummy variable
The case where it takes only the values 0 or 1 as the independent variable.
Level of significance
Its always a matter of judgement.
- There is a 5% chance of rejecting H0, which is true (Type I error).
- Decreasing level of significance from 0.05 to 0.01 decreases the probability of Type 1 error, but also increases probability of Type 2 error - failing to reject H0, when it is false.
p-value
The smallest level of significance at which H0 can be rejected.
- The smaller the p-value, the smaller the chance of Type 1 error (rejecting true H0), so greater likelihood the regression model is valid.
The standard error of forecast depends on:
- the standard error of the estimate, se.
- the number of observations, n.
- the forecasted value of independent variable, Xf, used to predict dependent variable and its deviation from estimated mean, X-
- the variation of independent variable.