CFA Level 2 Flashcards - 1
Adjusted R2
Goodness-of-fit measure that adjusts the coefficient of determination, R2, for the number of independent variables in the model.
Akaike’s information criterion (AIC)
A statistic used to compare sets of independent variables for explaining a dependent variable. It is preferred for finding the model that is best suited for prediction.
Analysis of variance (ANOVA)
The analysis that breaks the total variability of a dataset (such as observations on the dependent variable in a regression) into components representing different sources of variation.
Breusch–Godfrey (BG) test
A test used to detect autocorrelated residuals up to a predesignated order of the lagged residuals.
Breusch–Pagan (BP) test
A test for the presence of heteroskedasticity in a regression.
Coefficient of determination
The percentage of the variation of the dependent variable that is explained by the independent variables. Also referred to as the R-squared or R2.
Conditional heteroskedasticity
A condition in which the variance of residuals of a regression are correlated with the value of the independent variables.
Cook’s distance
A metric for identifying influential data points. Also known as Cook’s D (Di).
Dummy variable
An independent variable that takes on a value of either 1 or 0, depending on a specified condition. Also known as an indicator variable.
Durbin–Watson (DW) test
A test for the presence of first-order serial correlation.
First-order serial correlation
The correlation of residuals with residuals adjacent in time.
General linear F-test
A test statistic used to assess the goodness of fit for an entire regression model, so it tests all independent variables in the model.
Heteroskedastic
When the variance of the residuals differs across observations in a regression.
High-leverage point
An observation of an independent variable that has an extreme value and is potentially influential.
Influence plot
A visual that shows, for all observations, studentized residuals on the y-axis, leverage on the x-axis, and Cook’s D as circles whose size is proportional to the degree of influence of the given observation.
Influential observation
An observation in a statistical analysis whose inclusion may significantly alter regression results.
Interaction term
A term that combines two or more variables and represents their joint influence on the dependent variable.
Intercept dummy
An indicator variable that allows a single regression model to estimate two lines of best fit, each with differing intercepts, depending on whether the dummy takes a value of 1 or 0.
Joint test of hypotheses
The test of hypotheses that specify values for two or more independent variables in the hypotheses.
Leverage
A measure for identifying a potentially influential high-leverage point.
Likelihood ratio (LR) test
A method to assess the fit of logistic regression models and is based on the log-likelihood metric that describes the model’s fit to the data.
Log odds
The natural log of the odds of an event or characteristic happening. Also known as the logit function.
Logistic regression (logit)
A regression in which the dependent variable uses a logistic transformation of the event probability.
Logistic transformation
The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.
The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.
A method that estimates values for the intercept and slope coefficients in a logistic regression that make the data in the regression sample most likely.
Model specification
The set of independent variables included in a model and the model’s functional form.
Multicollinearity
When two or more independent variables are highly correlated with one another or are approximately linearly related.
Multiple linear regression
Modeling and estimation method that uses two or more independent variables to describe the variation of the dependent variable. Also referred to as multiple regression.
Negative serial correlation
A situation in which residuals are negatively related to other residuals.
Nested models
Models in which one regression model has a subset of the independent variables of another regression model.
Normal Q-Q plot
A visual used to compare the distribution of the residuals from a regression to a theoretical normal distribution.
Omitted variable bias
Bias resulting from the omission of an important independent variable from a regression model.
Outlier
An observation that has an extreme value of the dependent variable and is potentially influential.
Overfitting
Situation in which the model has too many independent variables relative to the number of observations in the sample, such that the coefficients on the independent variables represent noise rather than relationships with the dependent variable.
Partial regression coefficient
Coefficient that describes the effect of a one-unit change in the independent variable on the dependent variable, holding all other independent variables constant. Also known as partial slope coefficient.
Positive serial correlation
A situation in which residuals are positively related to other residuals.
Qualitative dependent variable
A dependent variable that is discrete (binary). Also known as a categorical dependent variable.
Restricted model
A regression model with a subset of the complete set of independent variables.
Robust standard errors
Method for correcting residuals for conditional heteroskedasticity. Also known as heteroskedasticity-consistent standard errors or White-corrected standard errors.
Scatterplot matrix
A visualization technique that shows the scatterplots between different sets of variables, often with the histogram for each variable on the diagonal. Also referred to as a pairs plot.
Schwarz’s Bayesian information criterion (BIC or SBC)
A statistic used to compare sets of independent variables for explaining a dependent variable. It is preferred for finding the model with the best goodness of fit.
Serial correlation
A condition found most often in time series in which residuals are correlated across observations. Also known as autocorrelation.
Serial-correlation consistent standard errors
Method for correcting serial correlation. Also known as serial correlation and heteroskedasticity adjusted standard errors, Newey–West standard errors, and robust standard errors.
Slope dummy
An indicator variable that allows a single regression model to estimate two lines of best fit, each with differing slopes, depending on whether the dummy takes a value of 1 or 0.
Studentized residual
A t-distributed statistic that is used to detect outliers.
Unconditional heteroskedasticity
When heteroskedasticity of the error variance is not correlated with the regression’s independent variables.
Unrestricted model
A regression model with the complete set of independent variables.
Variance inflation factor (VIF)
A statistic that quantifies the degree of multicollinearity in a model.
Autocorrelations
The correlations of a time series with its own past values.
Autoregressive model (AR)
A time series regressed on its own past values in which the independent variable is a lagged value of the dependent variable.
Chain rule of forecasting
A forecasting process in which the next period’s value as predicted by the forecasting equation is substituted into the right-hand side of the equation to give a predicted value two periods ahead.
Cointegrated
Describes two time series that have a long-term financial or economic relationship such that they do not diverge from each other without bound in the long run.
Covariance stationary
Describes a time series when its expected value and variance are constant and finite in all periods and when its covariance with itself for a fixed number of periods in the past or future is constant and finite in all periods.
Error autocorrelations
The autocorrelations of the error term.
First-differencing
A transformation that subtracts the value of the time series in period t – 1 from its value in period t.
Heteroskedasticity
The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations.
Homoskedasticity
The property of having a constant variance; refers to an error term that is constant across observations.
In-sample forecast errors
The residuals from a fitted time-series model within the sample period used to fit the model.
kth-order autocorrelation
The correlation between observations in a time series separated by k periods.
Linear trend
A trend in which the dependent variable changes at a constant rate with time.
Log-linear model
With reference to time-series models, a model in which the growth rate of the time series as a function of time is constant.
Mean reversion
The tendency of a time series to fall when its level is above its mean and rise when its level is below its mean; a mean-reverting time series tends to return to its long-term mean.
n-Period moving average
The average of the current and immediately prior n – 1 values of a time series.
Out-of-sample forecast errors
The differences between actual and predicted values of time series outside the sample period used to fit the model.
Random walk
A time series in which the value of the series in one period is the value of the series in the previous period plus an unpredictable random error.
Regime
With reference to a time series, the underlying model generating the times series.
Residual autocorrelations
The sample autocorrelations of the residuals.
Root mean squared error (RMSE)
The square root of the average squared forecast error; used to compare the out-of-sample forecasting performance of forecasting models.
Seasonality
A characteristic of a time series in which the data experience regular and predictable periodic changes; for example, fan sales are highest during the summer months.
Time series
A set of observations on a variable’s outcomes in different time periods.
Trend
A long-term pattern of movement in a particular direction.
Unit root
A time series that is not covariance stationary is said to have a unit root.
Activation function
A functional part of a neural network’s node that transforms the total net input received into the final output of the node. The activation function operates like a light dimmer switch that decreases or increases the strength of the input.
Agglomerative clustering
A bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster. The algorithm finds the two closest clusters, based on some measure of distance (similarity), and combines them into one new larger cluster. This process is repeated iteratively until all observations are clumped into a single large cluster.
Backward propagation
The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network’s layers.
Base error
Model error due to randomness in the data.
Bias error
Describes the degree to which a model fits the training data. Algorithms with erroneous assumptions produce high bias error with poor approximation, causing underfitting and high in-sample error.
Bootstrap aggregating (or bagging)
A technique whereby the original training dataset is used to generate n new training datasets or bags of data. Each new bag of data is generated by random sampling with replacement from the initial training set.
Centroid
The center of a cluster formed using the k-means clustering algorithm.
Classification and regression tree
A supervised machine learning technique that can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree. CART is commonly applied to binary classification or regression.