Quantitative Methods Flashcards
What is the sample correlation coefficient (r) for 2 variables?
r = Cov(X,Y) / SD X * SD Y
What is the t-test formula?
t = (r * √(n-2)) / (√1 – r^2)
What is the b formula?
b = Cov(X,Y) / Var X
How to interpret t-test value?
If calculated test statistic has higher absolute value than critical value, the value is significant.
What are the six assumptions of the classic normal linear regression model?
1) Linear relation exists between dependent and independent variables
2) Independent variable is not random
3) Expected value of error term in 0
4) Variance of error term is the same for all observations (homoskedasticity)
5) Error term is uncorrelated across observations
6) Error term is normally distributed
What does the standard error of estimate (SEE) measure?
How well the regression model fits the data. If SEE is small, the model fits well.
What is the formula for SEE?
SEE = ( (Unexplained variation / (n -2 ) )^0.5
What is the coefficient of determination and what is the formula
r^2 = (Total variation - Unexplained variation) / Total variation
What is the formula for F-statistic?
F-statistic = Regression MSS / Residual MSS
What is the formula for sample variance of dependent variable?
Total variation / (n-1)
How to calculate the interval confidence?
Coefficient +/- α * Standard Error
What is the formula for the f-test?
f-test = (RSS/k) / (SEE / (n-(k+1))) or MSR/MSE
In a multiple linear regression model, what is the t-test and how can we interpret its result?
t = b - 0 / Standard Error
The lower the p-value, the more significant the result
What are the six assumptions of the classical normal multiple linear regression model?
1) Linear relation exists between dependent and independent variables
2) Independent variables are not random. No exact linear relation exists between 2 or more independent variables.
3) Expected value of error term in 0
4) Variance of error term is the same for all observations (homoskedasticity)
5) Error term is uncorrelated across observations
6) Error term is normally distributed
When predicting the dependent variable using a linear regression model, what are the two types of uncertainty we encounter?
1) Uncertainty in the regression model itself (SEE)
2) Uncertainty about the estimates of the regression coefficients
What is the formula for adjusted R^2
Adjusted R^2 = 1 - ((n-1)/(n-k-1)) (1 - R^2)
What is conditional heteroskedasticity?
1) Variance of the errors differs across observations: error term is correlated with the values of the independent variables
2) F-Test is unreliable
3) SEE are underestimated and t-stats are inflated
4) If ignored, we tend to find significant relationships when none actually exists
How to correct for heteroskedasticity?
1) Computing robust standard errors
2) Generalized least squares (modifies original equation)
What is a simple test for conditional heteroskedasticity?
Breusch-Pagan test
What is serial correlation?
Positive serial correlation typically inflates the t-statistics of estimated regression coefficients as well as the F-statistic for the overall significance of the regression.
What is the test for serial correlation?
Durbin-Watson test
If DW > 2 : Negatively correlated
If DW = 2 : Not correlated
If DW < 2 : Positively correlated
IF DW > du, we conclude no evidence of positive serial correlation for the error term
How to correct for serial correlation?
1) Adjust the coefficient standard errors
2) Modify the equation
What is multicollinearity?
2 or more independent variables are highly (but not perfectly) correlated with each other.
Classic symptoms are: High R^2, t-test not significant
What is a probit model?
Estimates the probability of a discrete outcome given the values of the independent variables used to explain that outcome. It is based on the normal distribution.
What is a logit model?
Estimates the probability of a discrete outcome given the values of the independent variables used to explain that outcome. It is based on the logistic distribution.
How to correct for multicollinearity?
Remove one or more independent variables.
Time series that tend to grow by a constant amount should be modeled by…
Linear trend
Time series that tend to grow at a constant rate should be modeled by
Log linear trend
A time series is covariance stationary if:
1) Expected value of the time series must be constant and finite in all periods
2) Variance of the time series must be constant and finite in all periods
3) Covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all periods.
How to calculate the mean-reverting level?
Intercept / (1 - Coefficient)
What is the purpose of root mean squared error (RMSE)?
Compare the forecast accuracy of different time-series models ; smaller RMSE implies greater forecast accuracy.
What is a random walk?
Time-series in which the value of the series in one period is the value of the series in the previous period + unpredictable random error. A random walk is NOT covariance stationary. All random walks have unit roots.
Intercept near 0. Slope near 1.
What is ARCH and ARCH(1) errors?
Autoregressive conditional heteroskedasticity.
ARCH(1) errors: coefficient on the squared residual is statistically significant
C1 is significantly different form 0
Unit root: both independent and dependent variables need to be tested.
What is the conclusion if the Engle-Granger test rejects the null hypothesis that the error term has a unit root?
Covariance stationary ; time series are cointegrated; estimates of intercept and slope are valid.
What is a first-differenced random walk?
Special case of AR(1) model with b0 = 0 and b1 = 0
It is covariance stationary.
What are the assumptions for using a decision tree approach?
Discrete, Independent, Sequential
What are the assumptions for using a scenario analysis approach?
Discrete, Correlated, Concurrent
What are the assumptions for using a simulation approach?
Continuous, Independent or Correlated, Sequential or Concurrent
What is the most efficient choice to focus on when defining the probability distribution?
Directly estimating the statistical parameters of the variables.
If ARCH exists, how should we model the variables?
IF ARCH exists, it means that there is a heteroskedasticity. We must correct using:
1) Generalized least squares model
2) Computing robust standard errors
How can we interpret the Durbin-Watson stats?
If DW > upper critical value: fail to reject null hypothesis of no positive serial correlation
If DW < lower critical value: reject null hypothesis
If DW is in between: test is inconclusive
What is the purpose of adjusted R2?
Adjusted R2 adjusts for the loss of degrees of freedom when additional independent variables are added to a regression. It does not adjust for the effects of serial correlation in the data, nor does it adjust for heteroskedasticity.
How is the presence of heteroskedasticity indicated?
Systematic relationship between the residuals and the independent variable
How can we identify negative serial correlation?
Significantly large values of DW stats.
If DW > 4 - Dl
How can we interpret the unit root test?
If unit root test is smaller than critical value: model does not exhibit a unit root.
How can we interpret heteroskedasticity test?
If heteroskedasticity test stat is smaller than critical value: model does not have heteroskedasticity.
How can we calculate the prediction interval for a regression?
1) Prediction interval = Intercept + Slope
2) Prediction interval = Predicted value ± (critical t-statistic × standard error of forecast)