Quantitative Methods Flashcards

1
Q

r

A

r = Cov(X, Y) / sX sY

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

r (extended)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

spurious correlation

A
  • Correlation between two variables that reflects chance relationships in a particular data set
  • Correlation induced by a calculation that mixes each of two variables with a third
  • Correlation between two variables arising not from a direct relation between them but from their relation to a third variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

CFO

A

NI + non cash charges - working capital investement

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Assumptions of the linear regression model

A
  1. The relationship between the dependent variable, Y, and the independent variable, X is linear in the parameters b0 and b1. This requirement means that b0 and b1 are raised to the first power only and that neither b0 nor b1 is multiplied or divided by another regression parameter (as in b0/b1, for example). The requirement does not exclude X from being raised to a power other than 1
  2. The independent variable, X, is not random
  3. The expected value of the error term is 0: E(ε) = 0
  4. The variance of the error term is the same for all observations: E(ε2i)=σ2ε , i = 1, …, n
  5. The error term, ε, is uncorrelated across observations. Consequently, E(εiεj) = 0 for all i not equal to j
  6. The error term, ε, is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

b0

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

b1

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FFCF

A

CFO + Interest expense (1 - t) - FCInv

/

NI + non cash charges - WCInv + Interest expense (1 - t) - FCInv

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

t-test for the correlation coefficient

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Least square equation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Coefficient of determination

A

r2

/

1 - (unexplained variation/ total variation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

t-test for linear regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

t-test for linear regression - utility

A

For hypothesis tests concerning the population mean of a normally distributed population with unknown (known) variance, the theoretically correct test statistic is the t-statistic (z-statistic). In the unknown variance case, given large samples (generally, samples of 30 or more observations), the z-statistic may be used in place of the t-statistic because of the force of the central limit theorem

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

t-test for linear regression - degrees of freedom

A

of observations - (Number of independant variables + 1) =

n - (k + 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

t-test for linear regression - interval

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

SEE

A

(SSE/n-2)1/2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

SEE - relation to unexplained variation

A

Unexplained variation = SSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

SEE - definition

A

The standard error of the estimate is a measure of the accuracy of predictions made with a regression line. (Also called the residual standard error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

SE of the t-test for linear regression

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Standard error versus standard deviation

A

The standard error of the sample is an estimate of how far the sample mean is likely to be from the population mean, whereas the standard deviation of the sample is the degree to which individuals within the sample differ from the sample mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Type I error : rejecting a true null hypothesis

A

Type II error : failing to reject a false null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

p-value definition

A

Smallest level of significance at which the null hypothesis can be rejected

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

EV

A

Market value of equity and debt - value of cash and investments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

IC (Invested Capital)

A

Book value of debt and equity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q
  • R2
  • SST
  • SSR
  • SSE (sometimes residual sum fo square, RSS)
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

F-statistic definition

A

The F-statistic is the ratio of the average regression sum of squares to the average sum of the squared errors. It measures how well the regression equation explains the variation in the dependent variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Relation of the t-test and the F-test for regression with only one independent variable

A

In such regressions, the F-statistic is the square of the t-statistic for the regression coefficient

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

F-test for the regression coefficient with one independent variable - formula

A

(RSS/ 1) / (SSE/(n - 2))

/

Mean regression sum of squares/ Mean squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

F test for multiple regression coefficients

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

F test for multiple regression coefficients - notation

A

Fk, n- (k + 1)

k = slope coefficients

n = number of observations

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

S2f - formula

  • s2 being the squared standard error of estimate (SEE = s)
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

ANOVA

A

Analysis of variance - Used to determine the sources of variance of a variable - Uses the F-test to verify whether all regression coefficients are equal to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

ANOVA degrees of freedom

A
  • SSR = # of slope coefficients = k
  • SSE = # of observations - (Number of independant variables + 1) = n - (k + 1)
  • SST = # of observations - 1
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

S2f is the estimated variance of the prediction error. It is used to build an interval around the intercept Ŷ.

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Beta

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

RANVA

A

Risk adjusted net value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Assumptions of the multiple linear regression model

A
  1. The relationship between the dependent variable, Y, and the independent variables, X1, X2, …, Xk, is linear
  2. The independent variables (X1, X2, …, Xk) are not random. Also, no exact linear relation exists between two or more of the independent variables
  3. The expected value of the error term, conditioned on the independent variables, is 0: E(ε | X1,X2, …, Xk) = 0
  4. The variance of the error term is the same for all observations:  E(ε2i)=σ2ε
  5. The error term is uncorrelated across observations: E(εiεj) = 0, j ≠ i
  6. The error term is normally distributed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Adjusted R2 - definition

A
  • A measure of goodness-of-fit of a regression that is adjusted for degrees of freedom and hence does not automatically increase when another independent variable is added to a regression
  • If k ≥ 1, R2 is strictly greater than adjusted R2
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Adjusted R2

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Residual standard error

A
  • [SSE / n - (k + 1)]1/2
  • MSSE1/2
41
Q

Breusch-Pagan test

A
  • A test for conditional heteroskedasticity in the error term of a regression
  • Chi-squared with df = number of independent variables
  • Test statistic = nR2
  • R2 = coefficient of determination from the regression of the squared residuals on the independent variables from the original regression (not the R2 from the original regression)
42
Q

Generalized least squares

A

Eliminates heteroskedasticity

43
Q

Robust standart deviation

A

Accounts for conditional heteroskedasticity

44
Q
  • Conditional heteroskedasticity
  • Unconditional heteroskedasticity
A
  • Heteroskedasticity in the error variance that is correlated with the values of the independent variable(s) in the regression
  • Heteroskedasticity in the error variance that is not correlated with the values of the independent variable(s) in the regression
45
Q
  • Heteroskedasticity
  • Homoskedasticity
A
  • The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations
  • The property of having a constant variance; refers to an error term that is constant across observations
46
Q

Serially correlated

A

With reference to regression errors, errors that are correlated across observations

47
Q

Positive serial correlation

A

An error for one observation increases the chance of error for another observation

48
Q

First-order serial correlation

A

When the sign of the error tends to persist from one period to another

49
Q

Multicollinearity

A
  • A regression assumption violation that occurs when two or more independent variables (or combinations of independent variables) are highly but not perfectly correlated with each other
  • In order to correct the regression, we need to remove on or more of the highly correlated dependent variables
50
Q

Classic symptoms of multicollinearity

A
  • High R2
  • Significant F-statistic when the t-statistics are not significant
51
Q

Durbin and Watson test - utility

A

Test used for serial correlation

52
Q

Durbin and Watson test - formula

A
53
Q

Durbin and Watson regression residual for period t

A
54
Q

Durbin and Watson values

A
  • No serial correlation: 2
  • Serial correlation of 1: 0
  • Serial correlation of -1: 4
  • If > du, then we fail to reject the null hypothesis of no serial correlation
  • If < d1, then we reject the hypothesis of no serial correlation
  • Inconclusive between d1 and du
55
Q

Bias

A
  • Data-mining
  • Omitted variable bias
  • Multicollinearity (F-test)
  • Serial correlation (DW)
56
Q

Qualitative dependant variable

A

Use a logit or probit model

57
Q

Covariance-stationary

A
  • The mean and variance are constant through time
  • We can not use standard regression analysis on a time series that is not covariance-stationary
58
Q

Convergance of covariance stationary series

A

They converge to their mean reverting value : xt = b0/(1 - b1)

59
Q

Nonstationarity

A

Variables that contain trends

60
Q

Unit root

A

A time series that is not covariance stationary is said to have a unit root

61
Q

Mean reversion - formula and context

A
  • xt = b0/(1 - b1)
  • All covariance stationary time series have a finite mean-reverting level
62
Q

Autocorrelation

A

Correlation of a time serie with it’s own past values

/

Order of correlation k = number of periods lagged

63
Q

Method to correct autocorrelation

A
  • The most prevalent method for adjusting standard errors was developed by Hansen (1982)
  • An additional advantage of Hansen’s method is that it simultaneously corrects for conditional heteroskedasticity
64
Q

kth order autocorrelation

A
65
Q

kth order estimated autocorrelation

A
66
Q

Autocorrelation of the error term

A
67
Q

Standard error of the residual correlation (for autocorrelation)

A

1/(T½)

T = number of observations

68
Q

In-sample forecast errors - residuals from a fitted time series model

A

Out-of-sample forecast errors - differences between actual and predicted values outside the time period of the model

69
Q

Root mean squared error (RMSE)

A

Square root of the average squared error

70
Q

Random walk - formula

A
71
Q

Random walk covariance

A
72
Q

Random walk variance

A

(t - 1)σ2

73
Q

Dickey and Fuller test - formula

A
74
Q

Dickey and Fuller test - utility

A
  • Test for the unit root using an AR(1) model
  • The null hypothesis is: H0: g1 = 0
  • The alternative hypothesis is: Ha: g1 < 0
  • g1 = b1 - 1
75
Q

Seasonality in time-series - formula

A
76
Q

Autoregressive model (AR)

A
  • A time series regressed on its own past values
  • AR, MA & ARMA models
  • Should be covariance stationary (Dickey and Fuller test)
77
Q

Autoregressive conditional heteroskedasticity (ARCH) - ARCH (1) model distribution

A
78
Q

ARCH linear regression equation

A

If the estimate of a1 is statistically significantly different from zero, we conclude that the time series is ARCH(1)

79
Q

ARCH variance of the error

A
80
Q

Cointegrated

A

Two time-series are cointegrated if a long-term financial or economic relationship exists between them such that they do not diverge from each other without bound in the long run

81
Q

Durbin and Watson for lagged value (autoregressive models)

A

The test cannot be used for a regression that has a lagged value of the dependent variable as one of the explanatory variables. Instead, test whether the residuals from the model are serially correlated

82
Q

Multiple R

A
  • The correlation between actual and predicted values of the dependent variable
  • = (R2)1/2
83
Q

Nonlinear relation

A

An association or relationship between variables that cannot be graphed as a straight line

84
Q

Interpretation of the p-value

A
  • A small p-value (≤ 0.05) indicates strong evidence against the null hypothesis, so it is rejected
  • A large p-value (> 0.05) indicates weak evidence against the null hypothesis (fail to reject)
  • p-values very close to the cutoff (~ 0.05) are considered to be marginal (need attention)
85
Q

p-value for the Beta function *as a reference

A
86
Q

p-value for the Lower incomplete beta function *as a reference

A
87
Q

p-value for the Regularized lower incomplete beta function *as a reference

/

where the numerator is the lower incomplete beta function, and the denominator is the beta function

A
88
Q

p-value for the t-distribution cumulative distribution function (CDF) *as a reference

/

where v is the degrees of freedom, t is the upper limit of integration, and I is the regularized lower incomplete beta function

A
89
Q

Heteroskedasticity, serial correlation and multicollinearity - table

A
90
Q

Errors in models specifications

A
  • Data mining
  • Market timing
  • Time-series misspecification
91
Q

Moving-average model of order 1, MA(1)

/

Theta (θ) is the parameter of the MA(1) model

A
92
Q

Moving-average model of order q, MA(q)

A
93
Q

Autoregressive moving-average model (ARMA)

/

b1, b2, …, bp are the autoregressive parameters and θ1, θ2, …, θq are the moving-average parameters

A
94
Q

Multiple R versus r

A

Capital R2 (as opposed to r2) should generally be the multiple R2 in a multiple regression model. In bivariate linear regression, there is no multiple R, and R2=r2. So one difference is applicability: “multiple R” implies multiple regressors, whereas “R2” doesn’t necessarily. Another simple difference is interpretation. In multiple regression, the multiple R is the coefficient of multiple correlation, whereas its square is the coefficient of determination.

95
Q

p-value

A
  • A small p-value (≤ 0.05) indicates strong evidence against the null hypothesis, so it is rejected
  • A large p-value (> 0.05) indicates weak evidence against the null hypothesis (fail to reject)
  • p-values very close to the cutoff (~ 0.05) are considered to be marginal (need attention)
96
Q

Variables with a correlation close to 0 can nonetheless exhibit a strong relationship—just not a linear relationship

A

Correlation measures the linear association between two variables

97
Q

If the p-value if greater than 0.05

A

Then the test is not significant at the 5% level

98
Q

Significance F

A
  • Represents the level at which the test is significant
  • An entry of 0.01 for the significance of F means that the regression is significant at the 0.01 level
99
Q

Parameter instability

A

The problem or issue of population regression parameters that have changed over time