Quantitative Methods Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Multiple Regression

A

A model that allows for consideration of multiple underlying influences (independent variables) on the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is multiple regression used for?

A
  1. Identify relationships between variables
  2. Forecast Variables
  3. Test existing theories
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Multiple Regression model

A

The general multiple linear regression model is:

Yi = b0 + b1X1i + b2X2i + … + bkXki + εi

where:
Yi= ith observation of the dependent variable Y, i = 1, 2, …, n
Xj= independent variables, j = 1, 2, …, k\
Xji= ith observation of the jth independent variable
b0= intercept term
bj= slope coefficient for each of the independent variables
εi= error term for the ith observation
n= number of observations
k= number of independent variables

For Level II, in order to interpret regression results, we can alternatively use the p-value to evaluate the null hypothesis that a slope coefficient is equal to zero.

The p-value is the smallest level of significance for which the null hypothesis can be rejected. We test the significance of coefficients by comparing the p-value to the chosen significance level:

If the p-value is less than the significance level, the null hypothesis can be rejected.
If the p-value is greater than the significance level, the null hypothesis cannot be rejected.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Formulating the Multiple Regression Equation

A

The authors formulated the following regression equation using annual data (46 observations):

EG10 = b0 + b1PR + b2YCS + ε

The results of this regression are shown in Coefficient and Standard Error Estimates for Regression of EG10 on PR and YCS.

Coefficient and Standard Error Estimates for Regression of EG10 on PR and YCS

Coefficient Standard Error
Intercept –11.6% 1.657%
PR 0.25 0.032
YCS 0.14 0.280

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Intercept Term

A

is the value of the dependent variable when the independent variables are all equal to zero.

Intercept term: If the dividend payout ratio is zero and the slope of the yield curve is zero, we would expect the subsequent 10-year real earnings growth rate to be –11.6%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

partial slope coefficients

A

Multiple regression is sometimes called this because each slope coefficient is the estimated change in the dependent variable for a 1-unit change in that independent variable, holding the other independent variables constant.

PR coefficient: If the payout ratio increases by 1%, we would expect the subsequent 10-year earnings growth rate to increase by 0.25%, holding YCS constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

YCS coefficients

A

If the yield curve slope increases by 1%, we would expect the subsequent 10-year earnings growth rate to increase by 0.14%, holding PR constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q-Q Plot

A

A normal Q-Q plot (normally called simply a Q-Q plot), is used to compare a variable’s distribution to that of a normal distribution. We can employ a Q-Q plot to evaluate the standardized residuals of a regression model: the residuals should lie along a diagonal if they follow a normal distribution. Recall that 5% of normally distributed observations should be below –1.65 standard deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Coefficient of Determination, R2

A

R2 evaluates the overall effectiveness of the entire set of independent variables in explaining the dependent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

ANOVA TABLE

A

The results of the ANOVA procedure are presented in an ANOVA table, which accompanies a multiple regression output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Analysis of variance (ANOVA)

A

Is a statistical test that compares the means of more than two groups and separates the variability into random and systematic factors.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Heteroskedasticity

A

occurs when the variance of the residuals is not the same across all observations in the sample. This happens when there are subsamples that are more spread out than the rest of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Overfitting

A

Is a concept in data science, which occurs when a statistical model fits exactly against its training data. When this happens, the algorithm unfortunately cannot perform accurately against unseen data, defeating its purpose1. Broadly speaking, overfitting means our training has focused on the particular training set so much that it has missed the point entirely. In this way, the model is not able to adapt to new data as it’s too focused on the training set2.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Unconditional heteroskedasticity

A

occurs when the heteroskedasticity is not related to the level of the independent variables, which means that it doesn’t systematically increase or decrease with changes in the value of the independent variable(s). While this is a violation of the equal variance assumption, it usually causes no major problems with the regression.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Nested Models

A

models such that one model, called the full model or unrestricted model, has a higher number of independent variables while another model, called the restricted model, has only a subset of the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Conditional heteroskedasticity

A

is heteroskedasticity that is related to (i.e., conditional on) the level of the independent variables. For example, conditional heteroskedasticity exists if the variance of the residual term increases as the value of the independent variable increases, as shown in Conditional Heteroskedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Conditional Heteroskedasticity

A

Conditional Heteroskedasticity the residual variance associated with the larger values of the independent variable, X, is larger than the residual variance associated with the smaller values of X.) Conditional heteroskedasticity does create significant problems for statistical inference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Effect of Conditional Heteroskedasticity on Regression Analysis

A

There are two effects of conditional heteroskedasticity that you should be aware of:

  1. The standard errors are usually unreliable estimates. (For financial data, these standard errors are usually underestimated, resulting in Type I errors.)
  2. The F-test for the overall model is also unreliable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Breusch-Pagan (BP) test

A

Used to detect conditional heteroskedasticity. The BP test calls for the squared residuals (as the dependent variable) to be regressed on the original set of independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Serialcorrelation

A

Also known as autocorrelation, refers to a situation in which regression residual terms are correlated with one another: that is not independent. Serial correlation can pose a serious problem with regressions using time series data.

NOTE: Serial correlation observed in financial data (not residuals, which is our discussion here) indicates a pattern that can be modeled. This idea is covered in our reading on time series analysis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Positiveserial correlation

A

exists when a positive residual in one time period increases the probability of observing a positive residual in the next time period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Negativeserial correlation

A

occurs when a positive residual in one period increases the probability of observing a negative residual in the next period.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Breusch-Godfrey (BG) test

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Durbin-Watson (DW) statistic

A

Residual serial correlation at a single lag can be detected using the Durbin-Watson (DW) statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q
A

The BG test regresses the regression residuals against the original set of independent variables, plus one or more additional variables representing lagged residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

robuststandard errors

A

(also called Newey–West corrected standard errors or heteroskedasticity-consistent standard errors), used to correct for serial correlation in regression residuals

22
Q

Multicollinearity

A

refers to the condition when two or more of the independent variables, (or linear combinations of three or more independent variables), in a multiple regression are highly correlated with each other. This condition inflates standard errors and lowers t-stats.

23
Q

variance inflation factor (VIF)

A

we can quantify multicollinearity using the variance inflation factor (VIF) for each of the independent variables. We start by regressing one of the independent variable “j” against the remaining independent variables.

24
Q

high-leverage points

A

are the extreme observations of the independent (or ‘X’) variables.

25
Q

Outliers

A

are extreme observations of the dependent (or ‘Y’) variable

26
Q

Leverage

A

is a measure of the distance between the jth observation of independent variable i relative to its sample mean. Leverage takes a value between 0 and 1. The higher the value of leverage, the greater the distance—and hence the higher the potential influence of the observation—on the estimated regression parameters.

27
Q

Influential data points

A

are extreme observations that, when excluded, cause a significant change to model coefficients.

27
Q

studentized residuals

A

Used to identify outliers

28
Q

Cook’s distance (Di)

A

is a composite metric (i.e., it takes into account both the leverage and outliers) for evaluating if a specific observation is influential.

29
Q

influence plot

A

visually shows the three metrics for each observation.

30
Q

dummy variables

A

A dummy variable is a variable that takes values of 0 and 1, where the values indicate the presence or absence of something (e.g., a 0 may indicate a placebo and 1 may indicate a drug).

31
Q

intercept dummy

A

An intercept dummy variable is a dummy variable that shifts the constant term in a regression model123. It allows for a change in the intercept to classify different groups4.

32
Q

linear trend

A

is a time series pattern that can be graphed using a straight line. A downward sloping line indicates a negative trend, while an upward sloping line indicates a positive trend

32
Q

Trend

A

Time series has a Trend if a consistent pattern can be see by plotting the data on a graph.

32
Q

Time Series

A

A set of observations for a variable over successive periods of time (e.g., monthly stock market returns for the past 10 years)

33
Q

slope dummy

A

A slope dummy variable is a dummy variable that adjusts the connection between y and x12.

34
Q

qualitative dependent variable

A

a categorical variable, usually a binary variable, which takes on a value of either zero or one. An example of an application requiring the use of a qualitative dependent variable is a model that attempts to estimate the probability of default for a bond issuer. In this case, the dependent variable may take on a value of one in the event of default and zero in the event of no default.

35
Q

Linear vs. Log-Linear Trend Models

A

shows a time series that is best modeled with a log-linear trend model rather than a linear trend model.

36
Q

When should you use Logistic Regression Models?

A

if the dependent Y variable is discrete
if out independent X variable is qualitative

37
Q

When should you us multiple regression models?

A

When the dependent variable is continuous (not discrete) and tere is more than one explanatory variable (more than one dependent variable).
When multiple independent variables determine the outcome of a single dependent variable.
* Dependent Y Variable is continuous
* We have more than one dependent Y variable

38
Q

Assumption of Regression Models

A

L.I.I.N.H.
Linearity: Relationship between dependent Y variable and independent X variable is linear
Independent of Errors: Regression residuals are uncorrelated accross observation
Independent: Independent X variable is not random, there is no exact linear relationship between 2 or more independent variables
Normality: Regression residuals are normally distributed
Homoscedasticity: Constant variance of regression residuals

39
Q

How to determine a variable is significant?

A

[T-Stat]>1

40
Q

Degrees oif Freedom for SSR

A

N-k

41
Q

Degrees of Freedom for SST

A

N -1

42
Q

Degrees of Freedom for SSE

A

N-K+1

43
Q

What will happen to adjusted R-Square if we have insignificant variable

A

Adjusted R-Square decreases

44
Q

R-Square formula

A

SSR/SST = Explained Variation / Unexplained Variation
1-(unexplained variation/total variation)

45
Q

What kind of test is this?
H0: bi = Bi
Ha: bi/= Bi

A

Two Tail Test
]

45
Q

What kind of test is this?

H0: bi <= Bi
Ha: bi > Bi

A

Right tail test

<= - is heading right

46
Q

Model Misspecification - Omitted Variable

A

If we omit a significant variable from our model, the error term will capture the missing.

46
Q

Which of the following charts, when drawn on a grid, has the O column in alternation with the X column, but most likely does not have the column representing volume or time?
A. Candlestick Chart
B. Bar Chart
C. Point and Figure Chart

A

C. Point and Figure Chart
You need a graph paper to draw a point and figure chart. The X column and O column alternate, but the graph does not have a volume or time representation.

46
Q

You are an analyst and you need to present some stocks to your supervisor after rating them as outperform, neutral, and underperform. What is the best scale to represent this data?
A. Interval Scale
B. Ordinal Scale
C. Ratio Scale

A

B. Ordinal Scale
According to the specifications, you need to rate the stocks based on their expected performance in the future, not the performance differences between the asset classes, hence an ordinal scale would be the best option.

46
Q

When you are analyzing mutually exclusive projects, why shouldn’t you choose the IRR rule over NPV?
A. When using the IRR ranking, you assume the possibility of reinvestment at the opportunity cost of capital, which is not relevant economically, hence less realistic.
B. Discount rates and interest rates from external factors influence NPV rankings
C. NPV uses more conservative reinvestment rates, making it a relevant option

A

B. Discount rates and interest rates from external factors influence NPV rankings
The NPV rule is hugely dependent on the external market forces to determine the discount rate. This is because of the expectation of reinvestment at the opportunity cost of capital. When using IRR, the assumption is that any cash flow will be reinvested in the project, and for that reason the rankings are not influenced by external discounts or interest rates.

46
Q

What kind of test is this?

H0: bi => Bi
Ha: bi < Bi

A

Left tail test

=> is heading left

46
Q

In the last 24 months, you have obtained the following information concerning the return on an investment:
Mean Return = 15%
Standard Deviation of Returns = 9%
Assuming a 4% risk-free rate, what is the closest figure to the Sharpe ratio for this particular investment?
A. 1.02
B. 1.22
C. 0.33

A

B. 1.22
The Sharpe ratio is calculated as follows:
(0.15 - 0.04) / 0.09 = 1.22

46
Q

For a given present value and interest rate, the future value:
A. Increases as the number of compounding periods per year increases.
B. Decrases as the number of compounding periods per year increases
C. remains the sames as the number of compounding periods per year increases
D. remains the same as the number of compounding periods per year decreases

A
46
Q

For a given future value and interest rate, the present value:

A
47
Q

Jim Wilson planning to purchas

A
47
Q
A
47
Q
A
47
Q
A