Correlation And Regression Flashcards
Test statistic for significance of correlation coefficient
t = r sqrt(n - 2)
————–
sqrt(1 - r^2)
r = correlation coefficient n = number of samples
What is correlation coefficient
Measure of strength of linear relationship between two variables
x>0 = positive correlation
x<0 = negative correlation
X=0 = no correlation
Regression assumptions
Independent variable:
- linearly related to dependent var
- uncorrelated with residual
Residual term:
- exp value = 0
- constant variance
- independently and normally distributed
Confidence interval of y value
Predicted Y value +/- (critical t-value)*(standard error)
Difference between t-test and F-test
T-test: assesses statistical significance of individual regression parameters
F-test: assesses model effectiveness
How to test for heteroskedasticity
Using Breusch-Pagan test on regressed squared residuals
Lagrange Multiplier LM = n*R^2
Will be x^2 random variable w/df= num independent variables in regression
Or White Test?
How to test for serial correlation
Durbin-Watson test
If it differs sufficiently from 2, then regression errors have significant serial correlation
How to test for multicollinearity
No specific test
Identify whether R^2 is high with significant F-stat despite insignificant t-tests
Drop one of correlated variables
What are two types of tests for significance in multiple regression?
Whether each independent variable explains dependent (t stat)
Whether some or all independent variables explain dependent (F stat)
What is t stat formula for multiple regression
t = est regression param
—————————————
Std error of regression param
With n - k - 1 df
What info is included in an ANOVA table? (analysis of variance)
Degrees of freedom Sum of squares Mean square (ss / df)
For regression and error and total
How to calc MSR (mean squared regression)
k
How to calc MSE (mean squared error)
n - k - 1
What is coefficient of determination and how to calc (R^2)
% variation in dep var explained by indep vars
R^2 = regression sum of squares
——————————
total sum of squares
= SST - SSE ------------ SST
What does adjusted R^2 measure
Goodness of fit that adjusts for the number of independent variables included in the model
What is standard error of estimate and how to calc
Measures uncertainty of values of dependent variable around regression line
= Sqrt (mean squared error)
How to calculate F stat
mean squared error
With k and n-k-1 df
Reject H0 if F > Fcritical
Confidence interval for regression coefficient
Regression coefficient +- (critical t-value)(standard error of regression coefficient)
If zero is included in conf interval, slope not statistically significant
What is conditional heteroskedasticity
Residual variance related to level of independent variables
What is serial correlation
Residuals are correlated
What is multicollinearity
Two or more independent variables correlated
What are six common misspecifications of a regression model?
Omitting a variable Transforming variable Incorrectly pooled data Using lagged dependent var as an independent var Forecasting the past Measuring independent vars with error
What do model misspecifications result in?
Biased and inconsistent regression coefficients
No confidence in hypothesis tests of coefficients
No confidence in predictions
When do you use a linear trend model?
Data points equally distributed above and below line
Mean is constant
When do you use a log-linear model?
When dependent variable grows at constant rate (to a power)
Residuals may be correlated
Mean is non constant
Seasonality
When do you use an auto regressive model?
When dependent variable is regressed against previous values of itself
Autocorrelation a of residuals are not statistically signif
Use t test to identify correlation between residuals
What is autocorrelation?
When residuals exhibit serial correlation
Used to test for seasonality
What are 3 conditions required to be covariance stationary
Required for time series models
Three conditions:
- constant and finite expected value
- constant and finite variance
- constant and finite covariance with leading or lagged values
Three steps to determine whether a time series is covariance stationary and how to correct
Plot data; mean /variance constant?
Run AR model; test correlation
Dickey fuller test (for unit root)
Can correct with first differencing
What is mean reversion
b0/(1 - b1)
For AR(1) model
What is a unit root and how to identify
Unit root is when value of lag coefficient = 1
Use Dickey-Fuller test to identify
Cov stationary
Random walk
How to convert random walk to covariance stationary
Use first differencing (model change in value vs value)
Var y = xt - x(t- 1) = error term
What is root mean squared error used for
Used to assess predictive accuracy of autoregressive models
Lower RMSE is better
Out of sample RMSE tests forecasting ability
What is a structural change?
A significant shift in plotted data dividing in two distinct patterns
Results in coefficient instability
Run two diff models before and after change
What is cointegration
When two time series are economically linked
Related to same macro variable
How to test for cointegration
Regress one time series on other
Test residuals for unit root; if test rejects unit root, two series are cointegrated
If cointegrated, results in covariance stationary error term and reliable t tests
What is autoregressive conditional heteroskedasticity (ARCH) and what does it result in
Variance of residuals in one time period dependent on variance of residuals in another time period
Results in invalid reg coeff standard errors and hypothesis tests
How to correct for ARCH
Use Generalized least squares
Or use model to predict variance of residuals in following periods