Business Forecasting Topic 6 Flashcards
Regression Analysis
relationship between variable wanting to predict & other variable = explanation of behaviour
- various aspects of relationship between criterion and 1 or more explanatory variables (effect of explanatory on criterion)
- must distinguish between 2
-used for forecasting
criterion
dependent variable
explanatory variable
independent variable
correlation
measure strength of association between 2 variables
- initial assessment = scatter diagram
- scatter -> dependent variable = vertical axis (y)
regression
describe nature of association between variables
regression used to
- provide understanding of relationship between variables (effectiveness of activities)
- forecasts made
product moment correlation coefficient
PMCC
r
objective measure of strength of association between 2 variables
- 1 = perfect negative correlation
+1 = perfect positive correlation
interpreting correlation
- high correlation doesn’t imply causal relationship could be due to other factors
- outliers - distort (outlier or influential) = change correlation
- small sample -> observed correlation is high but no association
- PMCC only measures linear
2 other causes of a high correlation
- coincidence (over time period both increase but no link)
- hidden third variable/lurking variable (influence both variables)
Bivariate regression
fitting a line through scatter of points on scatter diagram
least squares criterion
best fitting line is one minimising sum of squared vertical direction from line
residual or error
vertical deviation from the line
best fitting line represented by equation
interpolation
explanatory variable in data range = more reliable
extrapolation
anything beyond data limits
falls outside of our observed points
- less reliable
- assumption that same linear relationship applies may not be valid
coefficient of determination
r squared
measures goodness of fit
values from 0-1.0
1 = perfect fit
e.g. 0.817 = 81.7% of variation is explained
high value doesn’t guarantee obtained the best regression model -> just say model fits past data well (but could yield poor forecasts)
same as co-efficient of correlation
significance of the regression line
two variables not related then β is zero
population values of intercept and slope
intercept = α
slope = β
t - test
null hypothesis
generates a p value
e.g. p = 0.001 -> statistically significant
4 assumptions underpinning the significance test
- sum of errors is 0
- errors = normally distributed
- homoscedacity
- erros associated with any 2 observations are independent
- test how well these are met be inspecting residuals of model
homoscedacity
variance of error is same
irrespective of the value of independent variable
inspecting the residuals of the regression model
see if assumptions appear to be met - useful to obtain plots of residuals
- histogram of residuals = reveal cant be normally distributed
- plotting residuals may reveal assumption of homoscedasticity is not valid
- plotting residuals against independent -> assumption of linearity is wrong
bivariate regression analysis
- model assumes linear relationship between variables
- make forecasts = assume relationship observed previously will continue, over time underlying relationship may change
- only 1 variable used to forecast (others will also be associated)
- Large residual observation = outlier vs influential observation
influential observation
KING KONG EFFECT
- large influence on line of best fit (if omitted from analysis = position of line = change)
- lie to extreme right or left of scatter away from bulk
- draw regression line towards them
- not large residuals therefore not outliers