Business Forecasting Topic 7 Flashcards
multiple regression
- obtain more powerful result if include more than 1 independent (explanatory) variable in model
-> dependent variables explained better
model of multiple regression
Coefficient of multiple determination
R squared
goodness of fit of model to the data is measured by this
how well explained the variation in dependent variable
coefficient of multiple correlation
square root
R
changes in R squared
increases/fails to decrease = as no. of independent variables added to regression model increases
even if new independent variables have no relationship with dependent variable (not worth including) -> counteract this with adjusted R squared
adjusted r squared equation
big sample = small changes
more variables = adjusted R squared is smaller
corrected for unwanted variables
adjusted for increase in no of independent variables and the sample size
significance tests for the multiple regression model
- based on same assumption as bivariate regression model
adress:
1. validity of regression model
2. validity of individual regression coefficient
if I variable = no relationship with D variable = β have a value of zero and true pop relationship would have zeros
f-statistic
tests the statistical significance of the whole regression model itself
most computers = give F and associated p-value automatically
e.g. p< 0.0001 = reject H0 model has explanatory power and at least one I variable has non zero coefficient
significance of each regression coefficient
each independent = set up hypotheses
t - statistic = test null hypothesis
significance testing answers?
- arisen by chance - model useless for forecasting?
- goodness of fit
- variables by chance
multicollinearity
- some/all of I variables = highly correlated (related)
2 I highly correlated -> linear combination of a subset of independent variables highly correlated with another
f test vs t test
t = individual variables
f = whole regression model
problem with multicollinearity
bi’s in the regression model = very imprecise estimates of true regression co-efficients the βi’s -> unreliable
difficult to precisely show the contributions
multicollinearity can lead to…
- estimated coefficients having wrong signs (+ve vs -ve)
- misleading p values for t-tests -> decision whether to include is wrong
- doenst affect predictive ability of model. danger = misleading indications about relationship natures - to include or not
dealing with multicollinearity
- specialised technique
- correlated variables combined in single ‘super variable’ -> principal components analysis
-simplest procedure = 1 or 2 or variables highly correlated - mean model lose predictive power
finding multicollinearity
if exists in problem -> look at correlation matrix
if explanatory variable correlated highly with another explanatory = chance of multicollinearity
- or variance inflation factor (VIF) and tolerance (inverse of VIF)
largest VIF > 10 = multicollinearity
Dummy variables
assume values of either 0 or 1
nominal variables - often increase explanatory or predictive power of regression model
multicollinearity among dummy variables = delta by reducing number of dummy variables in equation by 1
perfect multicollinearity
impossible to estimate model
standardised regression coefficients
- interpretation made easier if coefficients are standardised
standardisation = makes α the constant in regression model disappear - new regression coefficient = beta coefficients
beta coefficients
- dont depend on units of measurement of different variables
- give better idea of relative importance of independent variables -> useful to model consumer choice
- tell us relative importance to consumers of different attributes of product
building a regression model
- multicollinearity complicate decisions on which variables to include
- prerequisite = adequate theory/rationale to justify decision to include variable
- may be justified to include I variable even if p-value on t test is large
automated approaches to model building
- step wise regression
- best subsets regression
step-wise regression
forward-backward step-wise method
1. identify bivariate model choosing I variable highly correlated with dependent
2. add most predictive power selected and new regression equation
3. both I variables tests to see inclusion is justified
limitations of stepwise method
- potentially useful variable excluded if multicollinearity is present
- repeated significance tests reduces their power - chose conservative levels when determining inclusion or exclusion (e.g. 1%)
best sub sets regression
every model, every combination of I = identified and model with greatest predictive power is selected
Forced entry method
Enter in SPSS
all variables considered to making contribution towards outcome (dependent variable) forced into multiple regression = model obtained
initial model of explanatory not significant are removed =final model
concerns in using highest adjusted R squared to chose model
-only exceed low threshold to be included - t value only exceeds 1 = model has too many variables
conflicts with principle of parsimony (model as simple as possible)
s
concerns with using R squared alone
make us think model is improved through including variables