Quantitative Methods Flashcards
r
r = Cov(X, Y) / sX sY
r (extended)

spurious correlation
- Correlation between two variables that reflects chance relationships in a particular data set
- Correlation induced by a calculation that mixes each of two variables with a third
- Correlation between two variables arising not from a direct relation between them but from their relation to a third variable
CFO
NI + non cash charges - working capital investement
Assumptions of the linear regression model
- The relationship between the dependent variable, Y, and the independent variable, X is linear in the parameters b0 and b1. This requirement means that b0 and b1 are raised to the first power only and that neither b0 nor b1 is multiplied or divided by another regression parameter (as in b0/b1, for example). The requirement does not exclude X from being raised to a power other than 1
- The independent variable, X, is not random
- The expected value of the error term is 0: E(ε) = 0
- The variance of the error term is the same for all observations: E(ε2i)=σ2ε , i = 1, …, n
- The error term, ε, is uncorrelated across observations. Consequently, E(εiεj) = 0 for all i not equal to j
- The error term, ε, is normally distributed
b0

b1

FFCF
CFO + Interest expense (1 - t) - FCInv
/
NI + non cash charges - WCInv + Interest expense (1 - t) - FCInv
t-test for the correlation coefficient

Least square equation

Coefficient of determination
r2
/
1 - (unexplained variation/ total variation)
t-test for linear regression

t-test for linear regression - utility
For hypothesis tests concerning the population mean of a normally distributed population with unknown (known) variance, the theoretically correct test statistic is the t-statistic (z-statistic). In the unknown variance case, given large samples (generally, samples of 30 or more observations), the z-statistic may be used in place of the t-statistic because of the force of the central limit theorem
t-test for linear regression - degrees of freedom
of observations - (Number of independant variables + 1) =
n - (k + 1)
t-test for linear regression - interval

SEE
(SSE/n-2)1/2

SEE - relation to unexplained variation
Unexplained variation = SSE

SEE - definition
The standard error of the estimate is a measure of the accuracy of predictions made with a regression line. (Also called the residual standard error)
SE of the t-test for linear regression

Standard error versus standard deviation
The standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean
Type I error : rejecting a true null hypothesis
Type II error : failing to reject a false null hypothesis
p-value definition
Smallest level of significance at which the null hypothesis can be rejected
EV
Market value of equity and debt - value of cash and investments
IC (Invested Capital)
Book value of debt and equity
- R2
- SST
- SSR
- SSE (sometimes residual sum fo square, RSS)

F-statistic definition
The F-statistic is the ratio of the average regression sum of squares to the average sum of the squared errors. It measures how well the regression equation explains the variation in the dependent variable
Relation of the t-test and the F-test for regression with only one independent variable
In such regressions, the F-statistic is the square of the t-statistic for the regression coefficient
F-test for the regression coefficient with one independent variable - formula
(RSS/ 1) / (SSE/(n - 2))
/
Mean regression sum of squares/ Mean squared error
F test for multiple regression coefficients

F test for multiple regression coefficients - notation
Fk, n- (k + 1)
k = slope coefficients
n = number of observations
S2f - formula
- s2 being the squared standard error of estimate (SEE = s)

ANOVA
Analysis of variance - Used to determine the sources of variance of a variable - Uses the F-test to verify whether all regression coefficients are equal to 0
ANOVA degrees of freedom
- SSR = # of slope coefficients = k
- SSE = # of observations - (Number of independant variables + 1) = n - (k + 1)
- SST = # of observations - 1
S2f is the estimated variance of the prediction error. It is used to build an interval around the intercept Ŷ.

Beta

RANVA
Risk adjusted net value
Assumptions of the multiple linear regression model
- The relationship between the dependent variable, Y, and the independent variables, X1, X2, …, Xk, is linear
- The independent variables (X1, X2, …, Xk) are not random. Also, no exact linear relation exists between two or more of the independent variables
- The expected value of the error term, conditioned on the independent variables, is 0: E(ε | X1,X2, …, Xk) = 0
- The variance of the error term is the same for all observations: E(ε2i)=σ2ε
- The error term is uncorrelated across observations: E(εiεj) = 0, j ≠ i
- The error term is normally distributed
Adjusted R2 - definition
- A measure of goodness-of-fit of a regression that is adjusted for degrees of freedom and hence does not automatically increase when another independent variable is added to a regression
- If k ≥ 1, R2 is strictly greater than adjusted R2
Adjusted R2

Residual standard error
- [SSE / n - (k + 1)]1/2
- MSSE1/2
Breusch-Pagan test
- A test for conditional heteroskedasticity in the error term of a regression
- Chi-squared with df = number of independent variables
- Test statistic = nR2
- R2 = coefficient of determination from the regression of the squared residuals on the independent variables from the original regression (not the R2 from the original regression)
Generalized least squares
Eliminates heteroskedasticity
Robust standart deviation
Accounts for conditional heteroskedasticity
- Conditional heteroskedasticity
- Unconditional heteroskedasticity
- Heteroskedasticity in the error variance that is correlated with the values of the independent variable(s) in the regression
- Heteroskedasticity in the error variance that is not correlated with the values of the independent variable(s) in the regression
- Heteroskedasticity
- Homoskedasticity
- The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations
- The property of having a constant variance; refers to an error term that is constant across observations
Serially correlated
With reference to regression errors, errors that are correlated across observations
Positive serial correlation
An error for one observation increases the chance of error for another observation
First-order serial correlation
When the sign of the error tends to persist from one period to another
Multicollinearity
- A regression assumption violation that occurs when two or more independent variables (or combinations of independent variables) are highly but not perfectly correlated with each other
- In order to correct the regression, we need to remove on or more of the highly correlated dependent variables
Classic symptoms of multicollinearity
- High R2
- Significant F-statistic when the t-statistics are not significant
Durbin and Watson test - utility
Test used for serial correlation
Durbin and Watson test - formula

Durbin and Watson regression residual for period t

Durbin and Watson values
- No serial correlation: 2
- Serial correlation of 1: 0
- Serial correlation of -1: 4
- If > du, then we fail to reject the null hypothesis of no serial correlation
- If < d1, then we reject the hypothesis of no serial correlation
- Inconclusive between d1 and du
Bias
- Data-mining
- Omitted variable bias
- Multicollinearity (F-test)
- Serial correlation (DW)
Qualitative dependant variable
Use a logit or probit model
Covariance-stationary
- The mean and variance are constant through time
- We can not use standard regression analysis on a time series that is not covariance-stationary
Convergance of covariance stationary series
They converge to their mean reverting value : xt = b0/(1 - b1)
Nonstationarity
Variables that contain trends
Unit root
A time series that is not covariance stationary is said to have a unit root
Mean reversion - formula and context
- xt = b0/(1 - b1)
- All covariance stationary time series have a finite mean-reverting level
Autocorrelation
Correlation of a time serie with it’s own past values
/
Order of correlation k = number of periods lagged
Method to correct autocorrelation
- The most prevalent method for adjusting standard errors was developed by Hansen (1982)
- An additional advantage of Hansen’s method is that it simultaneously corrects for conditional heteroskedasticity
kth order autocorrelation

kth order estimated autocorrelation

Autocorrelation of the error term

Standard error of the residual correlation (for autocorrelation)
1/(T½)
T = number of observations
In-sample forecast errors - residuals from a fitted time series model
Out-of-sample forecast errors - differences between actual and predicted values outside the time period of the model
Root mean squared error (RMSE)
Square root of the average squared error

Random walk - formula

Random walk covariance

Random walk variance
(t - 1)σ2
Dickey and Fuller test - formula

Dickey and Fuller test - utility
- Test for the unit root using an AR(1) model
- The null hypothesis is: H0: g1 = 0
- The alternative hypothesis is: Ha: g1 < 0
- g1 = b1 - 1
Seasonality in time-series - formula

Autoregressive model (AR)
- A time series regressed on its own past values
- AR, MA & ARMA models
- Should be covariance stationary (Dickey and Fuller test)
Autoregressive conditional heteroskedasticity (ARCH) - ARCH (1) model distribution

ARCH linear regression equation
If the estimate of a1 is statistically significantly different from zero, we conclude that the time series is ARCH(1)

ARCH variance of the error

Cointegrated
Two time-series are cointegrated if a long-term financial or economic relationship exists between them such that they do not diverge from each other without bound in the long run
Durbin and Watson for lagged value (autoregressive models)
The test cannot be used for a regression that has a lagged value of the dependent variable as one of the explanatory variables. Instead, test whether the residuals from the model are serially correlated
Multiple R
- The correlation between actual and predicted values of the dependent variable
- = (R2)1/2
Nonlinear relation
An association or relationship between variables that cannot be graphed as a straight line
Interpretation of the p-value
- A small p-value (≤ 0.05) indicates strong evidence against the null hypothesis, so it is rejected
- A large p-value (> 0.05) indicates weak evidence against the null hypothesis (fail to reject)
- p-values very close to the cutoff (~ 0.05) are considered to be marginal (need attention)
p-value for the Beta function *as a reference

p-value for the Lower incomplete beta function *as a reference

p-value for the Regularized lower incomplete beta function *as a reference
/
where the numerator is the lower incomplete beta function, and the denominator is the beta function

p-value for the t-distribution cumulative distribution function (CDF) *as a reference
/
where v is the degrees of freedom, t is the upper limit of integration, and I is the regularized lower incomplete beta function

Heteroskedasticity, serial correlation and multicollinearity - table

Errors in models specifications
- Data mining
- Market timing
- Time-series misspecification
Moving-average model of order 1, MA(1)
/
Theta (θ) is the parameter of the MA(1) model

Moving-average model of order q, MA(q)

Autoregressive moving-average model (ARMA)
/
b1, b2, …, bp are the autoregressive parameters and θ1, θ2, …, θq are the moving-average parameters

Multiple R versus r
Capital R2 (as opposed to r2) should generally be the multiple R2 in a multiple regression model. In bivariate linear regression, there is no multiple R, and R2=r2. So one difference is applicability: “multiple R” implies multiple regressors, whereas “R2” doesn’t necessarily. Another simple difference is interpretation. In multiple regression, the multiple R is the coefficient of multiple correlation, whereas its square is the coefficient of determination.
p-value
- A small p-value (≤ 0.05) indicates strong evidence against the null hypothesis, so it is rejected
- A large p-value (> 0.05) indicates weak evidence against the null hypothesis (fail to reject)
- p-values very close to the cutoff (~ 0.05) are considered to be marginal (need attention)
Variables with a correlation close to 0 can nonetheless exhibit a strong relationship—just not a linear relationship
Correlation measures the linear association between two variables
If the p-value if greater than 0.05
Then the test is not significant at the 5% level
Significance F
- Represents the level at which the test is significant
- An entry of 0.01 for the significance of F means that the regression is significant at the 0.01 level
Parameter instability
The problem or issue of population regression parameters that have changed over time