Reading 12: Multiple Regression and Issues in Regression Analysis Flashcards
Formulate a multiple regression equation to describe the relation between a dependent varible and several independent variables, and determine the statistical significance of each independent variable.
Answer
Interpret estimated regression coefficients and their p-values.
p-value: what is the smallest significance level at which we can reject Ho.
If p-value < alpha, then reject Ho
If p-value > alpha, then fail to reject Ho
Formulate a null and an alternative hypothesis about the population value of a regression coefficient, calculate the value of the test statistic, and determine whether to reject the null hypothesis at a given significance level.
Answer
Interpret the results of hypothesis tests of regession coefficients.
Answer
Calculate and Interpret the following:
- a confidence interval for the population value of a regression coefficeint
- a predicted value for the dependent variable, given an estimated regression model and assumed values for the independent variables
- Coefficient +- (t-stat x standard error)
- Plug in the values BUT do not leave out any statistically insignicant coefficients
Explain the assumptions of a multiple regression model:
- Linear relationship between Y and X
- No exact linear relationship among X’s
- Expected value of error term = 0
- Variance of error term is constant
- Errors not serially correlated
- Error term normally distributed
IMPORTANT
Calculate and interpret the F-statisitc, and describe how it is used in regression analysis:
- F-Statistic: tests whether any independent variables explain variation in dependent variable
- Hypothesis Test
- Ho: All slope coefficients = 0
- Ha: At least one slope coefficient does not equal 0
- REJECT if Ho exceeds critical value
- Shape is determined by numerator and denominator Df
- OVERALL: the F-test tests the significance of the model
Distinguish between and interpret the R-squared and adjusted R-squared in multiple regression.
- R-squared: % of SST explained by RSS
- Adj. R-squared: compensates for the problem that as new variables are added to the model the R2 increases
Evaluate how well a regression model explains the dependent variable by analyzing the output of the regression equation and an ANOVA table.
Answer
Formulate a multiple regression equation by using dummy variables to represent qualitative factors, and interpret the coefficients and regression results.
Dummy Variable Trap: Always use onne less dummy variable than states of the world. (i.e. four quarters, use three dummies)
Explain the types of heteroskedasticity and how heteroskedasticity and serial correlation affect statistical inference.
Types of Heterskedascticity:
- Type 1: Unconditional (not related to independent variables) CAUSES NO MAJOR PROBLEMS
- Type 2: Conditional (related to independent variables) IS A PROBLEM and makes T-stats usuallly articifically high by making the standard error too small.
Serial Correlation (or Autocorrelation)
- Positive: each error term trends in the same direction Impact is the same as Conditional Heteroskedasticity. small SE, high T-stat
- Negative: opposite progression of figures (not likely in finance)
Describe multicollinearity, and explain its causes and effects in regression analysis.
Multicollinearity: two or more “X” variables are correlated with each other
Causes: correlation between two or more X variables
Effects:
- Inflates SEs; reduces t-stats
- Variables falsly look unimportant
- i.e. FALSE INSIGNIFICANCE
Describe how model misspecification affects the results of a regression analysis, and describe how to avoid the common forms of misspecification.
Misspecification: selection of explanatory variables, and/or transformation of variables that affects the reliability of the inference/hypothesis tests’
How to avoid common forms:
Describe models with qualitative dependent variables:
Cannot use oridnary least squares regression (OLS) analysis
LOGIT MODELS: calculate a probability based on logistic distribution (CAN HELP)
PROBIT MODEL: calculateds probabilitys based on normal distributions
DISCRIMINANT MODEL: Produce a score/ranking
Evaluate and interpret a multiple regression model and its results.
Answer