CFA L2 Quant Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

True or false: With financial instruments, we can typically use a one-factor linear regression model?

A

False, typically we need a multiple regression model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Multiple regression model

A

Regression models that allow to see the effects of multiple independent variables on one dependent variable.

Ex: Can the 10-year growth in the S&P 500 (dependent variable (Y)) be explained by the trailing dividend payout ratio of the index’s stocks (independent variable 1 (X1)) and the yield curve slope (independent variable 2 (X2))?

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the uses of multiple regression models?

A
  • Identify relationships between variables.
  • Forecast variables. (ex: forecast CFs or forecast probability of default)
  • Test existing theories.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Standard error

A

A statistical measure that shows how well the sample represents the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Residual (ε)

A

The difference between the observed Y value and the predicted Y value (ŷ).

ε = Y - ŷ
OR
Y - (b0 + b1x1 + b2x2 … + bnxn)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

P-value

A

The smallest level of significance for which the null hypothesis can be rejected.

  • If the p-value is less than the significance level (α), the null hypothesis can be rejected and if it’s greater it is failed to be rejected.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

If the significance level is 5% and the p-value is .06, do we reject the null hypothesis?

A

No, we fail to reject the null hypothesis.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Assumptions underlying a mutliple regression model:

A
  • A linear relationship exists between the dependent and independent variables.
  • The residuals are normally distributed.
  • The variance of the error terms is constant.
  • The residual of one observation ISN’T correlated w/ another.
  • The independent variables ARE NOT random
  • There is no linear relationship between any two or more independent variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q-Q plot

A

A plot used to compare a variable’s distribution to a normal distribution. The residual of the variable’s distribution should lie along a diagonal line if they follow a normal distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

True or false: For a standard normal distribution, only 5% of the observations should be beyond -2 standard deviations of 0?

A

False, only 5% of the observations should be beyond -1.65 standard deviations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Analysis of variance (ANOVA)

A

A statistical test used to assess the difference between the means of more than two groups. At its core, ANOVA allows you to simultaneously compare arithmetic means across groups. You can determine whether the differences observed are due to random chance or if they reflect genuine, meaningful differences.

  • A one-way ANOVA uses one independent variable.
  • A two-way ANOVA uses two or more independent variables.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Coefficient of determination (R^2)

A

The percentage of the total variation in the dependent variable explained by the independent variable(s).

R^2 = SSR/SST
OR
(SST - SSE) / SST

Ex: R^2 of 0.63 means that the model explains 63% of the variation in the dependent variable.

SSR= regression sum of squares. It’s the sum of the differences between the predicted value and the mean of the dependent variable.
RSS= regression sum of squares. It’s the total variation in the dependent variable explained by the independent variable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Adjusted R^2

A

Since R^2 almost always increases as more independent variables are added to the model, we must adjust it.

  • If adding an additional independent variable causes the adjusted R^2 to decrease, it’s not worth adding that variable.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Overfitting

A

When R^2 is high because there is a large # of indepedent variables, rather than a strong explanation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Akaike’s information criterion (AIC)

A

Looks at multiple regression models and determines which has the best forecast.

Calculation: (n * ln(SSE/n)) + 2(k+1)

  • Lower values indicate a better model.
  • Higher k values result in higher values of the criteria.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Schwarz’s Bayesian information criteria (BIC)

A

Looks at multiple regression models and determines which has a better goodness of fit.

Calculation: (n * ln(SSE/n)) + (ln(n)*(k+1))

  • Lower values indicate a better model.
  • Higher k values result in higher values of the criteria.
  • BIC imposes a higher penalty for overfitting than AIC.
  • AIC and BIC are alternatives to R^2 and adjusted R^2 to determine the quality of the regression model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Nested models

A

Models that have a full model and an unrestricted model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Full model vs restricted model

A

Full model= A linear regression model that uses all k independent variables

Restricted model= A linear regression model that only uses some of the k independent variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Joint F-Test

A

Measures how well a set of independent variables, as a group, explains the variation in the dependent variable. Put simply, it tests overall model significance.

Calculation: [ (SSErestricted - SSEunrestricted) / Q ] / [ (SSEunrestricted) / (n - k - 1) ]
* Q = # of excluded variables in the restricted model.
* Decision rule: reject the null hypothesis if F-stat > F critical value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

True or false: We could also use a t-test to evaluate the significance to see which variables are significant?

A

True, but the F-test provides a more meaningful evaluation since there is likely some amount of correlation among independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

True of false: The F-test will tell us if at least one of the slope coefficients in a multiple regression model is statistically different from 0?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

True or false: When testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a two tailed test?

A

False, when testing the hypothesis that all the regression coefficients are simultaneously equal to 0, the F-test is always a one tailed test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

True or false: We can use the regression equation to make predictions about the dependent variable based on forecasted values of the independent variable?

A

True, we can make predictions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Predicting the dependent variable from forecasted values of the independent variable:

A

ŷ = predicted value of the intercept + (X1 * estimated slope coefficient for X1) + (X2 * estimated slope coefficient for X2)…

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Functional form misspecifications (A regression suffers from misspecification of the functional form when the functional form of the estimated regression model differs from the functional form of the population regression function):

A
  • Omission of important independent variables: may lead to biased and inconsistent regression parameters OR serial correlation or heteroskedasticity in the residuals.
  • Inappropriate variable form (ex: you may need to take the natural log of a variable): may lead to heteroskedasticity in the residuals. This can happen if there is no linear relationship between the independent & dependent variables.
  • Inappropriate variable scaling (ex: common-size financial statements): May lead to heteroskedasticity in the residuals or multicollinearity.
  • Data improperly pooled: May lead to heteroskedasticity or serial correlation in the residuals.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Heteroskedasticity

A

When the variance of the residuals is not constant across all observations in the sample. This happens when there are subsamples that are more spread out than the rest of the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Unconditional heteroskedasticity

A

When the heteroskedasticity is not related to the # of independent variables meaning heteroskedasticity won’t increase/decrease as the amount of independent variables increase/decrease.

  • Although it’s a violation of our assumptions, it is usually not a big problem.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Conditional heteroskedasticity

A

Heteroskedasticity that is related to the # of independent variables. Creates significant problems for statistical interference if not corrected properly.

  • Conditional heteroskedasticity DOES NOT affect the slope coefficients. It DOES affect the computed F-stat and and computed T-stat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Effects of conditional heteroskedasticity

A

If the pattern of heteroskedasticity is low (most observations on the plot are low values): Standard errors (SEE) of the coefficients in a regression are affected by conditional heteroskedasticity and usually become unreliable estimates by being underestimated. This will lead to the T-stat being too large too often and thus rejecting the null too often, a.k.a type 1 error.

  • For the F test (MSR/MSE), MSE is underestimated, and therefore the F-stat is often too large leading to the null is rejected too often, a.ka type 1 error.
  • If the pattern of heteroskedasticity is high (most observations on the plot are high values): the same errors will happen but in the opposite direction.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

How to detect conditional heteroskedasticity

A

There are two methods of detection: examining scatter plots of the residuals and by using the Breusch-Pagan chi-square test.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How to use scatterplots to detect heteroskedasticity?

A

Look at a scatterplot of the residuals vs the independent variables. If the variation is constant there is no heteroskedasticity. If it’s not constant, there is heteroskedasticity.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Breusch-Pagan Chi-Square (BP) Test

A

A test used to detect heteroskedasticity. The BP test calls for the squared residuals (as the dependent variable) to be regressed on the original set of independent variables. If conditional heteroskedasticity is present, the independent variables will significantly contribute to the explanation of the variability in the squared residuals.

  • We want a small R^2 when using a BP test.
  • This is a one-tailed test because we are only concerned w/ large values.
  • Use a chi-square dist. with k df
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

How to correct heteroskedasticity?

A

We can use robust standard errors/white-corrected standard errors/heteroskedasticity-consistent standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Serial correlation/autocorrelation

A

When residuals are correlated with each other.

  • Poses serious problems when using time series data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Positive serial correlation

A

When a positive residual in one time period increases the probability of observing a positive residual in the next time period.

  • This type of correlation typically results in coefficient standard errors that are too small, causing T-stats or F-stats to be too large, which will lead to type 1 errors.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Effect of serial correlation on model parameters

A

If the dependent variable’s reaction to the independent variable has a lag in a regression model, serial correlation causes the estimates of the slope coefficients to be inconsistent. If there is no lag, then the estimates of the slope coefficient will be consistent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

How to detect serial correlation?

A

First, we can use a scatter plot. This will show very dramatic scenarios. We can also use a Durbin-Watston (DW) statistic or a Breusch-Godfrey (BG) test. The DW statistic is used to detect serial correlation at a single lag, whereas a BG test is used to detect serial correlation at multiple lags.

  • The lower limit for the DW table is 15 observations.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Breusch-Godfrey (BG) Test

A

The BG Test regresses the residuals against the original set of independent variables, plus one or more additional variables representing lagged residuals.

Calculation: ε = a1x1 + a2x2… + p1x1 + pnxn

  • The null under the BG test is that there is no serial correlation (i.e p1=0).
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

How to correct for serial correlation?

A

We can calculate robust standard errors/Newey-West corrected standard errors/heteroskedasticity-consistent standard errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

Multicollinearity

A

When independent variables in a multiple regression are correlated w/ each other

  • This inflates standard errors and lowers t-stats leading to the null failing to be rejected more often (type 2 error).
  • Also causes the model’s coefficients to become unreliable.
  • Multicollinearity has no effect on an F-stat
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

Effect of multicollinearity on model parameters

A

Multicollinearity DOES NOT affect the consistency of slope coefficients. Multicollinearity DOES make those estimates imprecise and unpredictable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How to detect multicollinearity?

A

The most easily observable sign is when t-tests indicate none of the individual coefficients are significantly different than zero, but the F-test indicates that at least one of the coefficients is statistically significant and the R^2 is high. This means that none of the individual variables cause variation in the dependent variable but combined together they are highly correlated which washes out the individual effects. More formally we use a variance inflation factor (VIF) for each of the independent variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Variance inflation factor (VIF)

A

Estimates how much of the variation in the dependent variable in a multiple regressions model is due to multicollinearity. We start by regressing one of the independent variables (making it a dependent variable) against the remaining independent variables.

VIF= 1 / (1 - Rj^2)
* VIF values >1 indicates that the variable is not highly correlated with other independent variables.
* VIF values >5 indicate further investigation.
* VIF values >10 indicate high correlation.

Rj^2 is the R^2 of J. J is the independent variable being regressed.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

How to correct multicollinearity?

A

The most common method to correct for multicollinearity is to omit one or more of the highly correlated independent variables. You can also use a proxy for one of the variables or increase the sample size.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

True or false: The coefficient on a variable in a multiple regression is the amount of return attributable to the variable?

A

TRUE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

True or false: Using actual instead of expected inflation will improve model specification?

A

False, using actual instead of expected inflation is likely to result in model misspecification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Outliers vs high-leverage points

A

Outliers: Extreme observations in the dependent (Y) variable

High-leverage points: Extreme observations in the independent (X) variable

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Leverage (in statistics)

A

This is a way of identifying extreme observations in the independent variable. A measure of the distance between the xth observation of the independent variable relative to its sample mean. Leverage values will be between 0 and 1. The closer to 1 the farther the distance. If a variable’s leverage is higher than three times the average ((3*(k+1))/n), it is considered potentially influential.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Studentized residuals

A

An alternative way of identifying outliers than leverage. The studentized residual is the # of standard deviations the data point is from the regression line. For each data point, the residual ÷ standard division is its standardized residual. There are four main steps to this process:
1. Estimate the regression model using the original sample size and then delete one observation and re-estimate the regression. Perform this sequentially deleting a new observation each time.
2. Compare the actual Y values of the deleted observation to the predicted y-values. ei= Y-ŷ
3. The studentized residual is the residual in #2 ÷ standard deviation. t= ei / s
4. Compare the studentized residuals to critical values in a t-table using n-k-2 df. Points that fall in the rejection region are termed outliers and potentially influential.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Influential data points

A

Extreme observations that, when excluded, cause a significant change to model coefficients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

True or false: All outliers and high-leverage points are influential on the regression?

A

FALSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

Cook’s Distance

A

A composite metric for evaluating if a high leverage and/or outlier is influential. Cook’s distance measures how much the estimated values of the regression change if certain high leverage points or outliers are deleted from the sample.

Calculation:
D= [ ei^2 / ((K+1) * MSE) ] * [ hi / (1-hx)^2 ]
* hi= leverage value for the xth observation
* ei= the residual for the ith observation

  • Values > than √(k/n) indicate the observation is highly likely to be an influential data point.
  • Generally, values > 1 indicate highly influential, whereas values > 0.5 indicate the need for further investigation.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Dummy variables

A

Binary variables with only two options

  • When assigning a numerical value, it can only be 0 and 1.
  • Always use (n-1) dummy variables to avoid multicollinearity (i.e., 3 dummy variables for 4 quarters in a year).
  • Ex: True/falseEx 2: On/off
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
54
Q

Dummy variables example:

A

EPS for four quarters:
EPS = 1.25 + 0.75Q1 - 0.20Q2 + 0.10Q3

Question 1: What this the predicted EPS for Q4?

Answer 1: EPS = 1.25 + 0.75(0) - 0.20(0) + 0.10(0) = 1.25
* omitted quarter shows as the intercept

Question 2: What is the predicted value for Q1?

Answer 2: EPS = 1.25 + 0.75(1) - 0.20(0) + 0.10(0) = 2.00

Question 3: What is the predicted EPS for Q1 of next year?

Answer 3: EPS = 1.25 + 0.75(1) - 0.20(0) + 0.10(0) = 2.00
* This simple model uses average EPS for any specific quarter over the past ten years as a forecast of EPS in its respective quarter of the following year.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
55
Q

Logistic regression (logit) model

A

Estimates the probability of a DISCRETE binary variable occurring.

Calculation: ln(p/(1-p)) = b0 + b1x1 + b2x2 … + ε
* The intercept value is an estimate of log odds when the values of all independent variables is zero.
* The change in log odds when one of the independent variables change is dependent on the curvature of the function.
* Odds= e^y
* Probability = 1 / (1 + n(p/(1-p))) OR 1 / (1 + e^(-yhat))

  • Logit models assume that residuals have a logistic distribution- similar to a normal distribution but with fatter tails.
  • Logit models are nonlinear
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
56
Q

Likelihood ratio (LR) test

A

Similar to joint F-test but for logit models. Measures the goodness of fit of a logit model.

Calculation= -2 * (log likelihood restricted model - log likelihood unrestricted model).

  • Recall, the restricted model has fewer independent variables.
  • Always provides a negative value.
  • Values closer to 0 indicate a better-fitting model.
  • LR test is a chi-square distribution.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
57
Q

Time-series data

A

A set of observations taken periodically (most often at equal intervals) at different points in time.

  • A key feature of a time series is that new data can be added w/o affecting the existing data.
  • Trends can be found by plotting these observations on a graph.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
58
Q

Linear trend

A

1/2 broad types of trend models. A time-series trend that can be graphed using a straight line. The independent variable will be time. A downward sloping linear trend indicates a negative trend and vice versa for a positive trend.

Simplest form: Y= bo +b1(t) + b2(t) … + ε

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
59
Q

Log-linear trend model

A

1/2 broad types of trend models. This is used to model positive and negative exponential growth. Recall, exponential growth is some constant growth rate (positive or negative). Exponential growth will show a convex curve.

Simplest form: e^(b0 + b1(t))
* b1 is the constant rate of growth.
* Rather than trying to fit the nonlinear data with a linear (straight line) regression, we take the natural log of both sides and transform it into a linear trend line called the log-linear model. This increases the predictive ability of the model.

Form: ln(y) = ln(e^(b0 + b1(t)))

  • Financial time series data is often modeled using log-linear trend models.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
60
Q

How to determine if a linear or log-linear trend model should be used?

A

Plot the data. A linear trend model may be used if the data points are equally distributed above and below the regression line (ex: inflation data is usually modeled with a linear trend model). If, when plotted, the data plots with a curved shape, use a log-linear trend model (ex: financial data- stock indices and stock prices- are often modeled with log-linear trend models).

  • If there is serial correlation, we will use an autoregressive model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
61
Q

True or false: For a time series model without serial correlation, the DW statistic should be approximately equal to 0?

A

False, for a time series model without serial correlation, the DW statistic should be approximately equal to 2. A DW that significantly differs from 2 suggests that the residuals are correlated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
62
Q

Autoregressive (AR) model

A

A time-series model that regresses the dependent variable against one or more lagged values of itself.

Ex: A regression of the sales of a firm against the sales of the firm in the previous month. In this model, past values are used to predict the current value of the variable.

Simplest form: Xt = bo + b1x_t-1 …. bpx_t-p + ε
* Xt= value of time series at time t
* X_t-1= value of time series at time t-1

  • DW test stat cannot be used to test for serial correlation in AR model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
63
Q

Covariance stationary

A

An AR model is covariance stationary if:
* There is a constant and finite expected value: the expected value is constant over time.
* Constant and finite variance: the volatility around the time series’ mean is constant over time.
* The covariance between any two observations w/ equal distance apart will be equal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
64
Q

True or false: A nonstationary time series can still produce meaningful results sometimes?

A

False, we need stationary covariance. A nonstationary time series will produce meaningless results.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
65
Q

True or false: We can use a DW or BG test to test for serial correlation in AR models?

A

False, we must use a t-test

  • We can use a DW or BG test for a TREND model.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
66
Q

T-stat for residual autocorrelations in AR model:

A

correlation of the error term with the kth lagged error term ÷ (1 ÷ √n)

Standard error = (1 ÷ √n)(n-2)
* dfn= # of observations.

  • If data is monthly, check for 12 lags to see if there’s serial correlation. If quarterly, check for 4 lags.
  • When there is statistically significant serial correlation in an AR model, it means that the model is incomplete. There’s still some pattern of data in the residuals that the model has failed to reveal.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
67
Q

Mean reversion

A

When a time-series has a tendency to move towards its mean. In other words, the dependent variable has a tendency to decline when the current value is above the mean and rise when the current value is below the mean. If a time series is at its mean reverting level, the model predicts the next value of the time series will be the same as its current value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
68
Q

Mean reverting level calculation

A

Xt = b0 ÷ (1 - b1)

  • The model will not be covariance stationary if b1 = 1
  • If Xt > than the mean reverting level, the model predicts that x_t+1 will be lower than Xt and vice versa.
  • All covariance stationary time series have a finite mean-reverting level.
  • As forecasts become more distant, the value of the forecast will be closer to the mean reverting level.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
69
Q

In-sample forecasts

A

Forecasts that are within the range of data used to estimate the model. This is where we compare how accurate our model is in forecasting the acutal data we used to develop the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
70
Q

Out-of-sample forecasts

A

Forecasts that are made outside of the sample period. This is where we compare how accurate a model is in forecasting the y-variable value for a time period outside the period used to develop the model.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
71
Q

Root mean squared error (RMSE)

A

Used to compare the accuracy of autoregressive models in forecasting out-of-sample values.

Ex: We have two AR models. To determine which model will more acurately forecast future values, we calculate the RMSE for the out-of-sample data.

  • The model with the lower RMSE for the out-of-sample data will have lower forecast error and will be expected to have better predictive power in the future.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
72
Q

True or false: Financial and economic time series inherently exhibit some form of instability or nonstationarity.

A

True. Since financial/economic conditions are dynamic, the coefficients in one period may be different from those in another period. Model with shorter estimated time periods are usually more stable for this reason. When selecting a time series sample, analysts should understand regulatory changes, changes to the economic environment, etc. If there have been large changes, the model may not be accurate.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
73
Q

True or false: There is a trade-off between statistical reliability in the long run and statistical stability in the short run?

A

True. Statistical reliability= if you use a long time period, there is more statistical reliability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
74
Q

Random walk

A

When, in an AR model, the value of the dependent variable in one period is equal to the value of the series in the previous period plus a random error term.

Form: Xt = X_t-1 + ε
* b0 = 0
* b1 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
75
Q

Random walk with a drift/Unit root

A

The same concept as a random walk but the intercept term is not equal to zero. Thus, the time series model is expected to increase/decrease by the intercept term and the error term.

Form: Xt = b0 + X_t-1 + ε
* b1 = 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
76
Q

True or false: A random walk with or w/o a drift is NOT covariance stationary?

A

True, random walks will always have a unit root which makes them not covariance stationary.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
77
Q

Why are unit roots problematic?

A

A unit root is when b1 = 1. If this occurs, then the mean reverting level (b0 ÷ (1 - b1)) is undefined.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
78
Q

How to determine whether a time series is covariance stationary:

A
  1. We can run an AR model and examine autocorrelations
  2. Perform a Dickey-Fuller test

  • We cannot use a T-test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
79
Q

Dickey-Fuller Test

A

A test we use in an AR model to determine if there’s a unit root.

Calculation: Xt = b0 + b1X1 + ε ↠ Xt - X_t-1 = b0 * (b1 X_t-1) - X_t-1 + ε ↠ Xt - X_t-1 = b0 + (b1 -1) * X_t-1 + ε ↠Then, test whether the new coefficient (b-1) [(b-1) a.k.a G] = 0 using a t-test.

  • The null hypothesis is that (b-1)= 0. If the null is failed to be rejected, the time series has a unit root and is nonstationary.
80
Q

True or false: The Dickey-Fuller test uses the standard T distribution to find the critical values?

A

False, it has its own distribution to calculate the critical values.

81
Q

First differencing

A

A procedure that transforms time series data w/ a random walk into a covariance stationary time series. The first differencing process involves subtracting the value of the time series (the dependent variable) in the immediately preceding period from the current value of the time series to define a new dependent variable, y.

82
Q

First differencing calculation

A
  1. If the original time series has a unit root, then ε= Xt - X_t-1
  2. Then we will create a new dependent variable: Yt = Xt - X_t-1 OR Yt = ε
  3. Then, if we state it in the form of an AR model: Yt= B0 + B1*(Y_t-1) + εB0 = B1 = 0
83
Q

Seasonality

A

A characteristic of a time series in which the data experiences regular and predictable changes that recur every calendar year.

  • If seasonality is present, we MUST adjust the AR model in order for it to be correctly specified.
84
Q

How to correct for seasonality?

A

We add an additional lag of the dependent variable to the original model as another independent variable. The lag will be X_t-4 in a quarterly model or X_t-12 in a monthly model.

Calculation: ln(Xt) = b0 + b1 * ln(X_t-1) + b2 * ln(X_t-4) + ε

85
Q

True or false: For a T-test with seasonality, the null hypothesis is that there is seasonality?

A

False. H0 = 0- no seasonality present; Ha ≠ 0- seasonality present.

86
Q

Autoregressive conditional heteroskedasticity (ARCH)

A

When the variance of the residuals in one period is dependent on the variance of the residuals in a previous period in an AR model. When ARCH exists, the standard errors of the coefficients and the hypothesis tests are invalid.

87
Q

ARCH model

A

A model used to test for ARCH.

  • If a time-series model has been determined to contain ARCH errors, regression procedures that correct for heteroskedasticity, such as generalized least squares, must be used in order to develop a predictive model. Otherwise, the standard errors of the model’s coefficients will be incorrect, leading to invalid conclusions.
88
Q

How to predict future variance of errors in a time series model?

A

After we run an ARCH model, if we determine that a1 is significant (the time series has ARCH), future variance of errors can be predicted by using:

σ^2_t+1 = a0hat + a1hat * εt^2

  • We cannot predict future variance if a1 is not significant.
89
Q

Multiple time series

A

When more than one time series is run at the same time.

Ex: Yt = b0 + b1 * Xt + εt ↠ Yt and Xt are two different time series.

  • Either or both of these time series could be subject to nonstantionarity.
90
Q

How to test for nonstationarity in a multiple time series model?

A

Run separate DF tests for each time series.

  • If either of the time series’ are nonstationary, the coefficients will be unreliable.
91
Q

Cointegration

A

When two time series are economically linked to the same macro variables or follow the same trend, and that relationship is not expected to change. If two time series are cointegrated, the error term from regressing one on the other is covariance stationary and the t-tests are reliable.

92
Q

Cointegration note

A

When working with two time series in a regression:
1. If neither time series has a unit root, then the regression can be used
2. If only one series has a unit root, the regression results will be invalid
3. If both time series have a unit root and are cointegrated, then the regression can be used
4. If both time series have a unit root but are not cointegrated, the regression results will be invalid.

93
Q

How to test whether two times series are cointegrated?

A

Regress one variable on the other:
Yt = b0 + b1 * Xt + ε
* Yt= value of time series ‘Y’ at time t.
* Xt= value of time series ‘X’ at time t.
Then, the residuals are tested for a unit root using the Dickey-Fuller test with critical t-values calculated by the Engle and Granger (DF-EG Test). If the DF test rejects the null hypothesis (Ho= no cointegration), then we conclude the error terms are covariance stationary and there is cointegration.

94
Q

Structural change

A

A significant shift in the plotted data at a point in time that essentially divides the data into two or more distinct patterns.

  • If there is structural change present, you must run two different models- one incorporating data before the date and one after the date.
95
Q

Machine learning (ML)

A

Filters useful info from substantial amounts of data by learning from known examples to find a pattern in the data. Machine learning acts without human intervention.

96
Q

Target variable

A

The dependent variable.

  • Target variables can be continuous, categorical, or ordinal.
97
Q

Feature

A

The independent variable

98
Q

Training data

A

The sample used to fit the model

99
Q

Hyperparameter

A

A model input specified by the researcher

100
Q

Supervised learning

A

1/3 types of ML. We teach the model, then with that knowledge have it predict future instances. Supervised learning uses labeled data- data where the target variable is defined. Supervised learning is used when the training data contains the ground truth= the target variable. Multiple regression is an example of supervised learning. Regression and classification are the two most common examples of supervised learning. If target variable is continuous then use a regression. If target variable is categorical or ordinal then use a classification model. Output of classification models looks to group observations.

101
Q

Unsupervised learning

A

1/3 types of ML. There is no labeled data, and instead the program seeks out patterns within the data.

102
Q

Deep learning networks (DLNs)

A

1/3 types of ML that is used for complex tasks such as imagine recognition, natural language processing, etc. Deep learning is based on neural networks. Deep learning is a self-teaching system. A type of NN with many hidden layers (at least two but often more than 20)

103
Q

Reinforced learning algorithms

A

Algorithms that have an agent seek for a reward given restraints. The RL does not rely on labeled data, but rather these programs learn from their own prediction errors.

104
Q

Neural networks

A

A group of ML algorithms applied to problems w/ significant nonlinearities.

105
Q

Generalization

A

The extent to which a ML program is able to make out-of-sample predictions.

106
Q

Overfitting for ML

A

When a large number of features (independent variables) are in the data set. Overfitting will decrease the accuracy of out-of-sample forecasts.

  • The training sample will have a high R^2 and the test sample will have a low R^2.
107
Q

True of false: Under supervised learning, a training sample is used to train a ML algorithm and a separate test sample is used to evaluate the model’s ability to accurately predict new data?

A

TRUE

108
Q

How to measure the ability an ML program generalizes?

A

Create three overlapping data sets:
Training sample: In-sample data. Used to train the ML algorithm.
Validation sample: Out-of-sample data. Used to tune the training model.
Test sample: Out-of-sample data.

  • A model that generalized well should have a high R^2 for in-sample and out-of-sample data.
109
Q

Bias errors

A

This is the in-sample error resulting from models with a poor fit.

  • Occurs when there is underfitting.
110
Q

Variance error

A

This is the out-of-sample error resulting from overfitted models that do not generalize well. This is the extent to which the ML model’s results change in response to test and validation sample data.

  • Associated with overfitting.
  • Increases with model complexity.
  • Nonlinear models tend to have high variance error.
111
Q

Base error

A

This is the out-of-sample error resulting from residual errors due to random noise. Just randomness in the data.

  • Decreases with model complexity.
  • Linear models tend to have high base error.
112
Q

Learning curve

A

Plots the accuracy rate in the test sample versus the size of the training sample. A ML model that generalizes well will show an improving accuracy rate as the sample size increases. The in-sample and out-of-sample error rates should converge toward the desired level as the sample size increases.

113
Q

In-sample accuracy rate calculation vs out-of-sample accuracy rate calculation vs base accuracy rate calculation

A

In-sample accuracy rate= 1 - bias error rate
Out-of-sample accuracy rate= 1 - variance error rate.
Base accuracy rate= 1 - base error rate.

114
Q

True or false: ML models with high bias error will not see the accuracy rates converge?

A

False, the accuracy rates will converge just far below the desired level.

115
Q

True or false: Models with high variance errors will see the accuracy rates of the in-sample data and out-of-sample data converge below the desired level?

A

False, only the in-sample accuracy rate will converge towards the desired level.

116
Q

How to minimize the effects of overfitting with an ML program?

A

Reduce complexity and cross validation.

117
Q

Cross validation

A

An estimate of out-of-sample error rates directly from the validation sample.

118
Q

Complexity reduction

A

A penalty imposed to exclude features that do not meaningfully contribute to out-of-sample prediction accuracy.

119
Q

Underfitting

A

When the ML algorithm fails to identify an actual relationship. This occurs when there is an oversimplified model.

  • R^2 will be low for in-sample and out-of-sample data.
  • High bias error
  • Linear functions are susceptible to underfitting.
120
Q

K-fold cross validation

A

A method for alleviating the holdout sample problem: when the training set is reduced too much. This process eliminates sampling bias. There are four steps in this process:
1. Shuffle the data randomly.
2. Divide the data into k equal sub-samples.
3. K-1 samples will be training samples with the remaining sample a validation sample.
4. This process is then repeated k times. The average of the k validation errors is then taken as a reasonable estimate of the model’s out-of-sample error.

121
Q

Penalized regression models

A

Models that reduce the problem of overfitting by imposing a penalty based on the # of features in the model. The penalty increases w/ the # of features used. This will exclude features that do not meaningfully contribute to out-of-sample prediction accuracy. Penalized regression models seek to minimize the SSE.

  • These models are used to forecast returns.
122
Q

Least absolute shrinkage and selection operator (LASSO)

A

This is a popular penalized regression model. LASSO attempts to minimize SSE and the sum of the absolute values of the slope coefficients of the regression. The penalty increases with number of features. There is a tradeoff in reducing SSE (increasing independent variables) and the penalty imposed. Investment analysts use LASSO to build parsimonious (few predictor variables) models.

123
Q

Regularization

A

A type of penalized regression. Forces the beta coefficients of nonperforming features towards zero. Regularization can be applied to non-linear models.

124
Q

Support Vector Machine (SVM)

A

A common supervised ML algorithm often used for textual, categorical data. The model assumes the data is linearly separable; An SVM is a linear classification algorithm. An SVM attempts to find the optimal hyperplane that separates two sets of data (classes) by the max amount using n features.

  • Applications of SVM in investment management include classifying debt issuers into likely-to-default versus not-likely-to-default issuers, stocks-to-short versus not-to-short, and even classifying text (from news articles or company press releases) as positive or negative.
125
Q

Soft margin classification

A

Handles misclassified observations in the training data in an SVM.

126
Q

K-nearest neighbor (KNN)

A

A common supervised ML algorithm. New observations are classified by finding the new observation and its k-nearest piece of data in the current data set. This is used for categorical data.

  • Investment applications of KNN include predicting bankruptcy, assigning a bond to a ratings class, predicting stock prices, and creating customized indices.
127
Q

Classification and regression trees (CART)

A

A common supervised ML algorithm. Classification trees are used when the target variable is categorical and typically when the target is binary. Regression trees are used when the target is continuous. Classification trees assign observations to one of two possible classifications at each node starting w/ the root node at the top, then moving to the decision nodes in the middle, and then the terminal nodes at the bottom.

  • To avoid overfitting, regularization criteria such as maximum tree depth, maximum number of decision nodes, and so on are specified by the researcher. Alternatively, sections of tree with minimal explanatory power are pruned.
  • Investment applications of CART include detecting fraudulent financial statements and selecting stocks and bonds.
  • With Classification and Regression Trees (CART), one way that regularization can be implemented is via pruning which will reduce the size of the regression tree—sections that provide little explanatory power are pruned (i.e., removed).
128
Q

Ensemble learning

A

A common supervised ML algorithm that combines the predictions from multiple models rather than a single model. The different models cancel out noise and result in a lower average error rate. There are two types of ensemble methods: aggregation of heterogeneous learners and aggregations of homogeneous learners. Ensemble learning typically produces more stable and accurate results than single models. Aims to decrease variance (bagging), decrease bias (boosting), and improving predictions (stacking).

129
Q

Aggregation of heterogeneous learners

A

Different algorithms are combined together through a voting classifier and each algorithm gets a vote. The answer with the most votes is the model we go with.

130
Q

Aggregations of homogeneous learners.

A

The same algorithm is used but on different training data. The different training data used by the same model can be derived through bootstrap resampling (a.k.a bagging).

131
Q

Random Forest

A

A common supervised ML algorithm. This is a variation of a classification tree where a large # of classification trees are trained using bagged data from the same data set. A random subset of features is used in creating each tree, thus every tree is different. This process mitigates overfitting and reduce noise from errors. A drawback of Random Forests is that the transparency of CART is lost.

  • Random forests can INCREASE the signal-to-noise ratio.
  • Investment applications of random forest include factor-based asset allocation, and prediction models for the success of an IPO.
132
Q

Principal component analysis (PCA)

A

A common unsupervised ML algorithm. Problems w/ too much noise arise when there are excessive amts of features (high dimensionality). PCA seeks to reduce this excess noise by discarding the excess features. A PCA transforms the feature’s covariance matrix in order to reduce highly correlated features into a smaller # of uncorrelated features, called eigenvectors, which are linear combinations of the original feature. Each eigenvector has an eigenvalue: the proportion of total variance in the data set explained by the eigenvector. The end product is an algorithm with lower dimensionality, which makes the model easier to train and interpret.

  • The process of reducing noise is called dimension reduction. Dimension reduction seeks to reduce this noise by discarding those attributes that contain little information.
133
Q

Scree plot

A

A plot that shows the proportion of total variance explained by each of the principal components.

134
Q

Clustering

A

A common unsupervised ML algorithm. Clustering is the process of grouping observations into categories based on similar attributes (a.k.a cohesion). The two most common types of clustering are: K-means clustering and hierarchical clustering.

135
Q

Cohesion

A

Grouping observations into categories based on the observations’ similarities.

136
Q

K-means clustering

A

1/2 main types of clustering that puts observations into k nonoverlapping clusters where k is a hyperparameter. Each cluster has a centroid (center of the cluster), and each new observation is assigned to a cluster based on its proximity to the centroid. As a new observation gets assigned to a cluster, its centroid is recalculated, which may result in reassignment of some observations, thus resulting in a new centroid and so forth until all observations are assigned and no new reassignment is made.

  • One limitation of this type of algorithm is that the hyperparameter k is chosen before clustering starts, meaning that one has to have some idea about the nature of the data set.
  • K-means clustering is used in investment management to classify thousands of securities based on patterns in high dimensional data.
137
Q

Hierarchical clustering

A

1/2 main types of clustering that builds a hierarchy of clusters without any predefined # of clusters.

138
Q

Agglomerative clustering/ Bottom-up clustering

A

1/2 types of hierarchical clusters. This starts with one observation as its own cluster and then adds other similar observations to that group, thus forming another nonoverlapping cluster. In the end, all observations are merged into a single cluster.

139
Q

Neural networks (NNs)

A

Made up of layers of neurons. The first layer is the input layer (node layer), which receives the input (the independent variables). The final layer is the output layer. In between exists hidden layers. Neurons of each layer are connected to neurons of the next layer through channels. There may be multiple hidden layers. The multiple layers allow the NN to model complex nonlinear functions. NNs are an adaptive system that computers use to learn from their mistakes and improve continuously. A group of ML algorithms applied to problems with significant nonlinearity.

140
Q

Divisive clusters/ top-down clustering

A

1/2 types of hierarchical clusters. The algorithm starts with one giant cluster, and then it partitions that cluster into smaller and smaller clusters. In the end, each cluster contains only one observation.

141
Q

Summation operator

A

Neurons comprise the summation operator which gathers the info from the neurons and assigns them a weighted average, then passes the info on to the activation function. The activation function then generates a value from the inputs. The value is then passed forward to other neurons in subsequent hidden layers- this process is called forward propagation.

142
Q

Backwards propagation

A

This is how the machine learns from its errors. When the weighted averages from the summation operators are adjusted as the algorithm learns from its errors.

143
Q

Steps in a supervised/ traditional ML model:

A
  1. Conceptualization of the problem
  2. Data collection
  3. Data preparation and wrangling: cleaning the data set and preparing it for the model.
  4. Data exploration: Feature selection and performing data analysis. Evaluating the data set and determining the most appropriate way to configure it for model training.
  5. Model training: Determining which ML algorithm to use, using a training data set, and tuning the model.
144
Q

Steps in a unsupervised/ textual ML model:

A
  1. Text problem formulation
  2. Text curation: ensuring the quality of data, for example by adjusting for bad or missing data.
  3. Text preparation and wrangling
  4. Text exploration
  5. Model training
145
Q

Big data

A

Immense amounts of data

146
Q

The 4 Vs of big data

A
  • Volume= The amt of data
  • Variety= The sources of data
  • Velocity= The speed w/ which the data is created and collected
  • Veracity= The quality of the data

  • Big data often suffers from low veracity
147
Q

Data cleansing

A

Reducing errors in raw data. Common errors include:
* Missing values
* Invalid values
* Inaccurate values
* Non-uniform values
* Duplicate observations

  • Removing HTML tags is part of the data cleansing step.
148
Q

Data wrangling

A

Prepping data for model use. This includes transforming and scaling. Data transformations include:
* Extraction
* Aggregation: consolidating two variables into one (using appropriate weighting)
* Filtration: removing irrelevant observations.
* Selection: removing features not needed for processing.
* Conversion of data of diverse types

149
Q

README Files

A

Contain info about how, what, and where the data is stored. Helps ensure validity.

150
Q

Application programming interface (API)

A

Data obtained from 3rd party sources

151
Q

Metadata

A

Data that describes other data by providing info about one or more aspects of the data. Essentially a summary.

152
Q

Winsorization

A

A way researchers exclude outliers. Instead of entirely excluding outliers, they substitute reasonable values in for them.

153
Q

Trimming

A

One way researchers exclude outliers. This type of means excludes a certain portion of the highest values and lowest values. For example, excludes lowest 1% and highest 1% of all values.

154
Q

Normalization

A

1/2 common types of scaling. Scales variable values between 0 and 1.

Calculation: (Xi - Xminimum) ÷ (Xmaximum - Xminimum)

  • Sensitive to outliers.
  • Use this when trying to understand where the variables lie within the data set.
155
Q

Cleansed text is normalized using these steps:

A
  • Lowercasing: Ex: Dog ↠ dog
  • Removal of stop words: super common unimportant words Ex: the, is, and, etc.
  • Stemming: Take similar words and combine them into a single word. Ex: integrate ↠ integration ↠ integrating
  • Lemmatization: Return the base of the word. Ex: saw ↠ see
  • Bag-of-words (BOW): A bow is just the results of steps #1-#4. All the collected words or tokens are collected w/o regard to occurrence. If order doesn’t matter we can stop here.
  • N-gram: If ordering is important, we can create a two-gram to look for two specific words that go together or three-gram that looks for three words that go together, and so on.
  • Organizing the BOW and N-Gram into a document term matrix (DTM):

  • Lemmatization, which takes places during the text wrangling/preprocessing process for unstructured data, is the process of converting inflected forms of a word into its morphological root (known as lemma). Lemmatization reduces the repetition of words occurring in various forms while maintaining the semantic structure of the text data, thereby aiding in training less complex ML models.
156
Q

Token (in text wrangling)

A

A word

157
Q

Black box approach to ML

A

ML models that give you a result without explaining how they get to their decision.

158
Q

Steps in data exploration

A
  1. Exploratory data analysis (EDA)
  2. Feature selection
  3. Feature engineering
159
Q

Exploratory data analysis

A

Involves looking at data descriptors (stats, heat maps, word clouds, etc.) w/ the objective of understanding the data’s properties, finding patterns/relationships, and planning modeling in future steps.

160
Q

Feature selection

A

A process to select only the needed attributes of the data for ML model training

161
Q

True or false: In feature selection, we try to ONLY include the features that contribute to the model’s out-of-sample predictive power.

A

TRUE

162
Q

Feature extraction

A

When a feature is created from the data set.

Ex: Creating a value for age using date of birth data.

163
Q

Feature engineering (FE)

A

Involves optimizing and improving the selected features.

164
Q

One-hot encoding (OHE)

A

A type of feature engineering. The process is used to convert a categorical feature into a dummy variable.

165
Q

Techniques of feature selection:

A
  • Term frequency= The # of times the token appears in the dataset
  • Document frequency= The # of documents that a token appears in ÷ the # of documents.
  • Chi-square Test= Ranks tokens by their usefulness to a certain class of info. Tokens with higher chi-square test-stat occur more frequently.
  • Mutual information Test= A numerical value indicating the contribution of a token to a specific class. Tokens with less frequencies in a class compared to another class it will have a value close to 1, whereas if a token appears a lot in all classes it will have a value of 0.
166
Q

Techniques of feature engineering:

A
  • Numbers= Tokens w/ standard lengths are converted into new tokens. Ex: 4 letter words converted into ‘#4’.
  • N-Grams
  • Name entity recognition (NER)= Assign tokens a NER tag based on their context. Ex: Europe-place ; Google-website.
  • Parts of Speech= Assign tokens a POS tag based on their language structure. For example: Google- PPN (proper noun) ; 2000 - CDN (cardinal #).
167
Q

Procedures before model training:

A

The researcher must define the objective(s) of data analysis, identify useful data points, and conceptualize the model. Once a ML algorithm/method is selected, he should specify the hyperparameters.

168
Q

Common model fitting errors:

A
  • Small training samples
  • Low # of features in the model. This can lead to an underfitting problem because the model doesn’t have enough info to find patterns.

  • Feature selection is important to mitigate underfitting and overfitting.
  • Feature engineering can reduce underfitting.
169
Q

Three tasks of model training:

A
  1. Method selection= choosing the right ML algorithm considering supervised/unsupervised learning, type of data, and size of data.
  2. Performance evaluation
  3. Tuning
170
Q

What type of ML algorithm do we use for text, numerical, and image data:

A
  • Text= SVMs and Generalized linear models (GLMs)
  • Numerical= Regression trees, CART methods, and classification methods.
  • Image= Neural networks and deep learning networks.
171
Q

Techniques to measure model performance:

A
  • Error analysis: Errors in classification problems can be false positives (type 1 errors) or false negatives (type 2 errors). We build confusion matrixes for type 1 and type 2 errors.
  • Receiver operating characteristic (ROC)
  • Root mean squared error (RMSE)
172
Q

Precision metric

A

A way to evaluate the fit of an ML algorithm. It’s the ratio of true positives (not false positives (type 1 errors)) to predicted positives. Use the precision metric when the cost of a type 1 error is large.

Calculation: True positives ÷ (True positives + false positives)

173
Q

Recall metric/ true positive rate

A

A way to evaluate the fit of an ML algorithm. It’s the ratio of true positives (not false positives (type 1 errors)) too all actual positives. Use when the cost of a type 2 error is large.

Calculation: True positives ÷ (True positives + false negatives)

174
Q

F1 score

A

A way to evaluate the fit of an ML algorithm. It’s the harmonic mean of precision and recall. The higher the better.

Calculation: (2 * precision * recall) ÷ (Precision + recall)

  • More appropriate than the model accuracy metric when there are class imbalances.
175
Q

Receiver operating characteristic (ROC)

A

A curve that plots the tradeoff between false positives and true positives. The true positive rate (recall metric) is plotted on the y-axis, whereas the false positive rate is plotted on the x-axis. The area under the curve (AUC) is a value from 0 - 1. The closer the value is to 1 the higher the predictive accuracy of the model. AUCs = 0 mean it’s never right and 0.5 mean 50% of the time- just guessing. The higher convexity of the curve the higher its AUC.

176
Q

True or false: There is a tradeoff between bias error and variance error to where the model is overfitting and underfitting?

A

TRUE

177
Q

Fitting curve

A

A graph that plots error (in-sample error (training sample error) and out-of-sample error (cross-validation sample error) on the y-axis and model complexity on the x-axis. The graph shows two curves: a curve for training error and a curve for cross-validation prediction error.

178
Q

Ceiling analysis

A

An evaluation and tuning of each components in the model.

  • Applied to complex models.
179
Q

What is the primary limitation of trend models?

A

The primary limitation of trend models is that they are not useful if the residuals exhibit serial correlation.

180
Q

True or false: The KNN is a parametric test?

A

False, it’s non-parametric: it makes no assumptions regrading the distribution of the data.

181
Q

What are LASSO models and regularization used for?

A

LASSO models are used to build parsimonious models and regularization is used for nonlinear models.

182
Q

What are SVMs used for?

A

Generates binary classifications, such as: classifying debt issuers into likely-to-default versus not-likely-to-default issuers, stocks-to-short versus not-to-short, and even classifying text (from news articles or company press releases) as positive or negative.

183
Q

What are KNNs used for?

A

Predicting bankruptcy, assigning a bond to a ratings class, predicting stock prices, and creating customized indices.

184
Q

What are CARTs used for?

A

Fraud detection in financial statements and selecting stocks/bonds.

185
Q

What are random forests used for?

A

Factor-based asset allocation and prediction models for the success of an IPO.

186
Q

True or false: NNs have an input layer node that consists of a summation operator and an activation function?

A

False, the hidden layer nodes (not the input layer nodes) each consist of a summation operator and an activation function; these nodes are where learning takes place.

187
Q

True or false: The coefficients on each dummy tells us about the difference in earnings per share between the respective quarter and the one left out?

A

TRUE

188
Q

True or false: The F-statistic enables us to make conclusions about how several independent variables affect a dependent variable?

A

False, it only allows us the reject the hypothesis that all regression coefficients are zero and accept the hypothesis that at least one isn’t.

189
Q

True or false: Serial correlation affects the consistency of regression coefficients?

A

FALSE

190
Q

Difference between RSS and R^2?

A

The RSS is just the absolute amount of explained variation, the R^2 is the (RSS/SST)- the absolute amount of variation as a proportion of total variation. It’s like saying NI is an absolute figure, whereas ROE is NI as a proportion of equity.

191
Q

Sum of squared errors (SSE)

A

The absolute amount of unexplained variation

192
Q

T-test

A

A statistical test to determine if there is a significant difference between the means of two groups and how they’re related. The t-stat tells us if we need to reject the null or not.

193
Q

True or false: The F-statistic does not enable us to conclude on both independent variables. It only allows us the reject the hypothesis that all regression coefficients are zero and accept the hypothesis that at least one isn’t?

A

True

194
Q

True or false: If it is determined that conditional heteroskedasticity is present in model one, both the regression coefficients and the standard errors will be biased?

A

False, regression coefficients will be unbiased but standard errors will be biased.

195
Q

True or false: In the presence of serial correlation, if the independent variable is a lagged value of the dependent variable, then regression coefficient estimates are invalid and coefficients’ standard errors are deflated, so t-statistics are inflated?

A

True

196
Q

Deep learning nets

A

Neural networks with many hidden layers—at least 3, but often more than 20 hidden layers—are known as deep learning nets.