CFA Level 2 Flashcards - 1
Adjusted R2
Goodness-of-fit measure that adjusts the coefficient of determination, R2, for the number of independent variables in the model.
Akaike’s information criterion (AIC)
A statistic used to compare sets of independent variables for explaining a dependent variable. It is preferred for finding the model that is best suited for prediction.
Analysis of variance (ANOVA)
The analysis that breaks the total variability of a dataset (such as observations on the dependent variable in a regression) into components representing different sources of variation.
Breusch–Godfrey (BG) test
A test used to detect autocorrelated residuals up to a predesignated order of the lagged residuals.
Breusch–Pagan (BP) test
A test for the presence of heteroskedasticity in a regression.
Coefficient of determination
The percentage of the variation of the dependent variable that is explained by the independent variables. Also referred to as the R-squared or R2.
Conditional heteroskedasticity
A condition in which the variance of residuals of a regression are correlated with the value of the independent variables.
Cook’s distance
A metric for identifying influential data points. Also known as Cook’s D (Di).
Dummy variable
An independent variable that takes on a value of either 1 or 0, depending on a specified condition. Also known as an indicator variable.
Durbin–Watson (DW) test
A test for the presence of first-order serial correlation.
First-order serial correlation
The correlation of residuals with residuals adjacent in time.
General linear F-test
A test statistic used to assess the goodness of fit for an entire regression model, so it tests all independent variables in the model.
Heteroskedastic
When the variance of the residuals differs across observations in a regression.
High-leverage point
An observation of an independent variable that has an extreme value and is potentially influential.
Influence plot
A visual that shows, for all observations, studentized residuals on the y-axis, leverage on the x-axis, and Cook’s D as circles whose size is proportional to the degree of influence of the given observation.
Influential observation
An observation in a statistical analysis whose inclusion may significantly alter regression results.
Interaction term
A term that combines two or more variables and represents their joint influence on the dependent variable.
Intercept dummy
An indicator variable that allows a single regression model to estimate two lines of best fit, each with differing intercepts, depending on whether the dummy takes a value of 1 or 0.
Joint test of hypotheses
The test of hypotheses that specify values for two or more independent variables in the hypotheses.
Leverage
A measure for identifying a potentially influential high-leverage point.
Likelihood ratio (LR) test
A method to assess the fit of logistic regression models and is based on the log-likelihood metric that describes the model’s fit to the data.
Log odds
The natural log of the odds of an event or characteristic happening. Also known as the logit function.
Logistic regression (logit)
A regression in which the dependent variable uses a logistic transformation of the event probability.
Logistic transformation
The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.
The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.
A method that estimates values for the intercept and slope coefficients in a logistic regression that make the data in the regression sample most likely.
Model specification
The set of independent variables included in a model and the model’s functional form.
Multicollinearity
When two or more independent variables are highly correlated with one another or are approximately linearly related.
Multiple linear regression
Modeling and estimation method that uses two or more independent variables to describe the variation of the dependent variable. Also referred to as multiple regression.
Negative serial correlation
A situation in which residuals are negatively related to other residuals.
Nested models
Models in which one regression model has a subset of the independent variables of another regression model.
Normal Q-Q plot
A visual used to compare the distribution of the residuals from a regression to a theoretical normal distribution.
Omitted variable bias
Bias resulting from the omission of an important independent variable from a regression model.
Outlier
An observation that has an extreme value of the dependent variable and is potentially influential.
Overfitting
Situation in which the model has too many independent variables relative to the number of observations in the sample, such that the coefficients on the independent variables represent noise rather than relationships with the dependent variable.
Partial regression coefficient
Coefficient that describes the effect of a one-unit change in the independent variable on the dependent variable, holding all other independent variables constant. Also known as partial slope coefficient.
Positive serial correlation
A situation in which residuals are positively related to other residuals.
Qualitative dependent variable
A dependent variable that is discrete (binary). Also known as a categorical dependent variable.
Restricted model
A regression model with a subset of the complete set of independent variables.
Robust standard errors
Method for correcting residuals for conditional heteroskedasticity. Also known as heteroskedasticity-consistent standard errors or White-corrected standard errors.
Scatterplot matrix
A visualization technique that shows the scatterplots between different sets of variables, often with the histogram for each variable on the diagonal. Also referred to as a pairs plot.
Schwarz’s Bayesian information criterion (BIC or SBC)
A statistic used to compare sets of independent variables for explaining a dependent variable. It is preferred for finding the model with the best goodness of fit.
Serial correlation
A condition found most often in time series in which residuals are correlated across observations. Also known as autocorrelation.
Serial-correlation consistent standard errors
Method for correcting serial correlation. Also known as serial correlation and heteroskedasticity adjusted standard errors, Newey–West standard errors, and robust standard errors.
Slope dummy
An indicator variable that allows a single regression model to estimate two lines of best fit, each with differing slopes, depending on whether the dummy takes a value of 1 or 0.
Studentized residual
A t-distributed statistic that is used to detect outliers.
Unconditional heteroskedasticity
When heteroskedasticity of the error variance is not correlated with the regression’s independent variables.
Unrestricted model
A regression model with the complete set of independent variables.
Variance inflation factor (VIF)
A statistic that quantifies the degree of multicollinearity in a model.
Autocorrelations
The correlations of a time series with its own past values.
Autoregressive model (AR)
A time series regressed on its own past values in which the independent variable is a lagged value of the dependent variable.
Chain rule of forecasting
A forecasting process in which the next period’s value as predicted by the forecasting equation is substituted into the right-hand side of the equation to give a predicted value two periods ahead.
Cointegrated
Describes two time series that have a long-term financial or economic relationship such that they do not diverge from each other without bound in the long run.
Covariance stationary
Describes a time series when its expected value and variance are constant and finite in all periods and when its covariance with itself for a fixed number of periods in the past or future is constant and finite in all periods.
Error autocorrelations
The autocorrelations of the error term.
First-differencing
A transformation that subtracts the value of the time series in period t – 1 from its value in period t.
Heteroskedasticity
The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations.
Homoskedasticity
The property of having a constant variance; refers to an error term that is constant across observations.
In-sample forecast errors
The residuals from a fitted time-series model within the sample period used to fit the model.
kth-order autocorrelation
The correlation between observations in a time series separated by k periods.
Linear trend
A trend in which the dependent variable changes at a constant rate with time.
Log-linear model
With reference to time-series models, a model in which the growth rate of the time series as a function of time is constant.
Mean reversion
The tendency of a time series to fall when its level is above its mean and rise when its level is below its mean; a mean-reverting time series tends to return to its long-term mean.
n-Period moving average
The average of the current and immediately prior n – 1 values of a time series.
Out-of-sample forecast errors
The differences between actual and predicted values of time series outside the sample period used to fit the model.
Random walk
A time series in which the value of the series in one period is the value of the series in the previous period plus an unpredictable random error.
Regime
With reference to a time series, the underlying model generating the times series.
Residual autocorrelations
The sample autocorrelations of the residuals.
Root mean squared error (RMSE)
The square root of the average squared forecast error; used to compare the out-of-sample forecasting performance of forecasting models.
Seasonality
A characteristic of a time series in which the data experience regular and predictable periodic changes; for example, fan sales are highest during the summer months.
Time series
A set of observations on a variable’s outcomes in different time periods.
Trend
A long-term pattern of movement in a particular direction.
Unit root
A time series that is not covariance stationary is said to have a unit root.
Activation function
A functional part of a neural network’s node that transforms the total net input received into the final output of the node. The activation function operates like a light dimmer switch that decreases or increases the strength of the input.
Agglomerative clustering
A bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster. The algorithm finds the two closest clusters, based on some measure of distance (similarity), and combines them into one new larger cluster. This process is repeated iteratively until all observations are clumped into a single large cluster.
Backward propagation
The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network’s layers.
Base error
Model error due to randomness in the data.
Bias error
Describes the degree to which a model fits the training data. Algorithms with erroneous assumptions produce high bias error with poor approximation, causing underfitting and high in-sample error.
Bootstrap aggregating (or bagging)
A technique whereby the original training dataset is used to generate n new training datasets or bags of data. Each new bag of data is generated by random sampling with replacement from the initial training set.
Centroid
The center of a cluster formed using the k-means clustering algorithm.
Classification and regression tree
A supervised machine learning technique that can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree. CART is commonly applied to binary classification or regression.
Cluster
A subset of observations from a dataset such that all the observations within the same cluster are deemed “similar.”
Clustering
The sorting of observations into groups (clusters) such that observations in the same cluster are more similar to each other than they are to observations in other clusters.
Complexity
A term referring to the number of features, parameters, or branches in a model and to whether the model is linear or non-linear (non-linear is more complex).
Composite variable
A variable that combines two or more variables that are statistically strongly related to each other.
Cross-validation
A technique for estimating out-of-sample error directly by determining the error in validation samples.
Deep learning
Machine learning using neural networks with many hidden layers.
Deep neural networks
Neural networks with many hidden layers—at least 2 but potentially more than 20—that have proven successful across a wide range of artificial intelligence applications.
Dendrogram
A type of tree diagram used for visualizing a hierarchical cluster analysis; it highlights the hierarchical relationships among the clusters.
Dimension reduction
A set of techniques for reducing the number of features in a dataset while retaining variation across observations to preserve the information contained in that variation.
Divisive clustering
A top-down hierarchical clustering method that starts with all observations belonging to a single large cluster. The observations are then divided into two clusters based on some measure of distance (similarity). The algorithm then progressively partitions the intermediate clusters into smaller ones until each cluster contains only one observation.
Eigenvalue
A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector.
Eigenvector
A vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features.
Ensemble learning
A technique of combining the predictions from a collection of models to achieve a more accurate prediction.
Ensemble method
The method of combining multiple learning algorithms, as in ensemble learning.
Features
The independent variables (X’s) in a labeled dataset.
Fitting curve
A curve which shows in- and out-of-sample error rates (E in and E out) on the y-axis plotted against model complexity on the x-axis.
Forward propagation
The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network’s layers.
Generalize
When a model retains its explanatory power when predicting out-of-sample (i.e., using new data).
Hierarchical clustering
An iterative unsupervised learning procedure used for building a hierarchy of clusters.
Holdout samples
Data samples that are not used to train a model.
Hyperparameter
A parameter whose value must be set by the researcher before learning begins.
Information gain
A metric which quantifies the amount of information that the feature holds about the response. Information gain can be regarded as a form of non-linear correlation between Y and X.
K-fold cross-validation
A technique in which data (excluding test sample and fresh data) are shuffled randomly and then are divided into k equal sub-samples, with k – 1 samples used as training samples and one sample, the kth, used as a validation sample.
K-means
A clustering algorithm that repeatedly partitions observations into a fixed number, k, of non-overlapping clusters.
K-nearest neighbor
A supervised learning technique that classifies a new observation by finding similarities (“nearness”) between this new observation and the existing data.
LASSO
Least absolute shrinkage and selection operator is a type of penalized regression which involves minimizing the sum of the absolute values of the regression coefficients. LASSO can also be used for regularization in neural networks.
Labeled dataset
A dataset that contains matched sets of observed inputs or features (X’s) and the associated output or target (Y).
Learning curve
A curve that plots the accuracy rate (= 1 – error rate) in the validation or test samples (i.e., out-of-sample) against the amount of data in the training sample, which is thus useful for describing under- and overfitting as a function of bias and variance errors.
Learning rate
A parameter that affects the magnitude of adjustments in the weights in a neural network.
Linear classifier
A binary classifier that makes its classification decision based on a linear combination of the features of each data point.
Majority-vote classifier
A classifier that assigns to a new data point the predicted label with the most votes (i.e., occurrences).
Neural networks
Computer programs based on how our own brains learn and process information.
Penalized regression
A regression that includes a constraint such that the regression coefficients are chosen to minimize the sum of squared residuals plus a penalty term that increases in size with the number of included features.
Principal components analysis (PCA)
An unsupervised ML technique used to transform highly correlated features of data into a few main, uncorrelated composite variables.
Projection error
The vertical (perpendicular) distance between a data point and a given principal component.
Pruning
A regularization technique used in CART to reduce the size of the classification or regression tree—by pruning, or removing, sections of the tree that provide little classifying power.
Random forest classifier
A collection of a large number of decision trees trained via a bagging method.
Regularization
A term that describes methods for reducing statistical variability in high-dimensional data estimation problems.
Reinforcement learning
Machine learning in which a computer learns from interacting with itself or data generated by the same algorithm.
Scree plots
A plot that shows the proportion of total variance in the data explained by each principal component.
Soft margin classification
An adaptation in the support vector machine algorithm which adds a penalty to the objective function for observations in the training set that are misclassified.
Summation operator
A functional part of a neural network’s node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function.
Supervised learning
A machine learning approach that makes use of labeled training data.
Support vector machine
A linear classifier that determines the hyperplane that optimally separates the observations into two sets of data points.
A linear classifier that determines the hyperplane that optimally separates the observations into two sets of data points.
In machine learning, the dependent variable (Y) in a labeled dataset; the company in a merger or acquisition that is being acquired.
Test sample
A data sample that is used to test a model’s ability to predict well on new data.
Training sample
A data sample that is used to train a model.
Unsupervised learning
A machine learning approach that does not make use of labeled training data.
Validation sample
A data sample that is used to validate and tune a model.
Variance error
Describes how much a model’s results change in response to new data from validation and test samples. Unstable models pick up noise and produce high variance error, causing overfitting and high out-of-sample error.
Accuracy
The percentage of correctly predicted classes out of total predictions. It is an overall performance metric in classification problems.
Application programming interface (API)
A set of well-defined methods of communication between various software components and typically used for accessing external data.
Bag-of-words (BOW)
A collection of a distinct set of tokens from all the texts in a sample dataset. BOW does not capture the position or sequence of words present in the text.
Ceiling analysis
A systematic process of evaluating different components in the pipeline of model building. It helps to understand what part of the pipeline can potentially improve in performance by further tuning.
Collection frequency (CF)
The number of times a given word appears in the whole corpus (i.e., collection of sentences) divided by the total number of words in the corpus.
Confusion matrix
A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).
Corpus
A collection of text data in any form, including list, matrix, or data table forms.
Data preparation (cleansing)
The process of examining, identifying, and mitigating (i.e., cleansing) errors in raw data.
Data wrangling (preprocessing)
This task performs transformations and critical processing steps on cleansed data to make the data ready for ML model training (i.e., preprocessing), and includes dealing with outliers, extracting useful variables from existing data points, and scaling the data.
Document frequency (DF)
The number of documents (texts) that contain a particular token divided by the total number of documents. It is the simplest feature selection method and often performs well when many thousands of tokens are present.
Document term matrix (DTM)
A matrix where each row belongs to a document (or text file), and each column represents a token (or term). The number of rows is equal to the number of documents (or text files) in a sample text dataset. The number of columns is equal to the number of tokens from the BOW built using all the documents in the sample dataset. The cells typically contain the counts of the number of times a token is present in each document.
Exploratory data analysis (EDA)
The preliminary step in data exploration, where graphs, charts, and other visualizations (heat maps and word clouds) as well as quantitative methods (descriptive statistics and central tendency measures) are used to observe and summarize data.
F1 score
The harmonic mean of precision and recall. F1 score is a more appropriate overall performance metric (than accuracy) when there is unequal class distribution in the dataset and it is necessary to measure the equilibrium of precision and recall.
Feature engineering
A process of creating new features by changing or transforming existing features.
Feature selection
A process whereby only pertinent features from the dataset are selected for model training. Selecting fewer features decreases model complexity and training time.
Frequency analysis
The process of quantifying how important tokens are in a sentence and in the corpus as a whole. It helps in filtering unnecessary tokens (or features).
Grid search
A method of systematically training a model by using various combinations of hyperparameter values, cross validating each model, and determining which combination of hyperparameter values ensures the best model performance.
Ground truth
The known outcome (i.e., target variable) of each observation in a labelled dataset.
Metadata
Data that describes and gives information about other data.
Mutual information
Measures how much information is contributed by a token to a class of texts. MI will be 0 if the token’s distribution in all text classes is the same. MI approaches 1 as the token in any one class tends to occur more often in only that particular class of text.
N-grams
A representation of word sequences. The length of a sequence varies from 1 to n. When one word is used, it is a unigram; a two-word sequence is a bigram; and a 3-word sequence is a trigram; and so on.
Name entity recognition
An algorithm that analyzes individual tokens and their surrounding semantics while referring to its dictionary to tag an object class to the token.
One hot encoding
The process by which categorical variables are converted into binary form (0 or 1) for machine reading. It is one of the most common methods for handling categorical features in text data.
Parts of speech
An algorithm that uses language structure and dictionaries to tag every token in the text with a corresponding part of speech (i.e., noun, verb, adjective, proper noun, etc.).
Precision
In error analysis for classification problems it is ratio of correctly predicted positive classes to all predicted positive classes. Precision is useful in situations where the cost of false positives (FP), or Type I error, is high.
Readme files
Text files provided with raw data that contain information related to a data file. They are useful for understanding the data and how they can be interpreted correctly.
Recall
Also known as sensitivity, in error analysis for classification problems it is the ratio of correctly predicted positive classes to all actual positive classes. Recall is useful in situations where the cost of false negatives (FN), or Type II error, is high.
Regular expression (regex)
A series of texts that contains characters in a particular order. Regex is used to search for patterns of interest in a given text.
Scaling
The process of adjusting the range of a feature by shifting and changing the scale of the data. Two of the most common ways of scaling are normalization and standardization.
Sentence length
The number of characters, including spaces, in a sentence.
Term frequency (TF)
Ratio of the number of times a given token occurs in all the texts in the dataset to the total number of tokens in the dataset.
Token
The equivalent of a word (or sometimes a character).
Tokenization
The process of representing ownership rights to physical assets on a blockchain or distributed ledger.
Trimming
Also called truncation, it is the process of removing extreme values and outliers from a dataset.
Web spidering (scraping or crawling) programs
Programs that extract raw content from a source, typically web pages.
Winsorization
The process of replacing extreme values and outliers in a dataset with the maximum (for large value outliers) and minimum (for small value outliers) values of data points that are not outliers.
Absolute version of PPP
An extension of the law of one price whereby the prices of goods and services will not differ internationally once exchange rates are considered.
Covered interest rate parity
The relationship among the spot exchange rate, the forward exchange rate, and the interest rates in two currencies that ensures that the return on a hedged (i.e., covered) foreign risk-free investment is the same as the return on a domestic risk-free investment. Also called interest rate parity.
Ex ante version of PPP
The hypothesis that expected changes in the spot exchange rate are equal to expected differences in national inflation rates. An extension of relative purchasing power parity to expected future changes in the exchange rate.
FX carry trade
An investment strategy that involves taking long positions in high-yield currencies and short positions in low-yield currencies.
Forward rate parity
The proposition that the forward exchange rate is an unbiased predictor of the future spot exchange rate.
International Fisher effect
The proposition that nominal interest rate differentials across currencies are determined by expected inflation differentials.
Portfolio balance approach
A theory of exchange rate determination that emphasizes the portfolio investment decisions of global investors and the requirement that global investors willingly hold all outstanding securities denominated in each currency at prevailing prices and exchange rates.
Real interest rate parity
The proposition that real interest rates will converge to the same level across different markets.
Relative version of PPP
The hypothesis that changes in (nominal) exchange rates over time are equal to national inflation rate differentials.
Triangular arbitrage
An arbitrage transaction involving three currencies that attempts to exploit inconsistencies among pairwise exchange rates.
Uncovered interest rate parity
The proposition that the expected return on an uncovered (i.e., unhedged) foreign currency (risk-free) investment should equal the return on a comparable domestic currency investment.
Absolute convergence
The idea that developing countries, regardless of their particular characteristics, will eventually catch up with the developed countries and match them in per capita output.
Capital deepening
An increase in the capital-to-labor ratio.
Club convergence
The idea that only rich and middle-income countries sharing a set of favorable attributes (i.e., are members of the “club”) will converge to the income level of the richest countries.
Cobb–Douglas production function
A function of the form Y = Kα L1–α relating output (Y) to labor (L) and capital (K) inputs.
Conditional convergence
The idea that convergence of per capita income is conditional on the countries having the same savings rate, population growth rate, and production function.
Constant returns to scale
The condition that if all inputs into the production process are increased by a given percentage, then output rises by that same percentage.
Diminishing marginal productivity
When each additional unit of an input, keeping the other inputs unchanged, increases output by a smaller increment.
Dutch disease
A situation in which currency appreciation driven by strong export demand for resources makes other segments of the economy (particularly manufacturing) globally uncompetitive.
Growth accounting equation
The production function written in the form of growth rates. For the basic Cobb–Douglas production function, it states that the growth rate of output equals the rate of technological change plus α multiplied by the growth rate of capital plus (1 – α) multiplied by the growth rate of labor.
Labor force
Everyone of working age (ages 16 to 64) who either is employed or is available for work but not working.
Labor force participation rate
The percentage of the working age population that is in the labor force.
Labor productivity
The quantity of goods and services (real GDP) that a worker can produce in one hour of work.
Labor productivity growth accounting equation
States that potential GDP growth equals the growth rate of the labor input plus the growth rate of labor productivity.
Network externalities
The impact that users of a good, a service, or a technology have on other users of that product; it can be positive (e.g., a critical mass of users makes a product more useful) or negative (e.g., congestion makes the product less useful).
Non-convergence trap
A situation in which a country remains relatively poor, or even falls further behind, because it fails to implement necessary institutional reforms and/or adopt leading technologies.
Non-renewable resources
Finite resources that are depleted once they are consumed; oil and coal are examples.
Potential GDP
The maximum amount of output an economy can sustainably produce without inducing an increase in the inflation rate. The output level that corresponds to full employment with consistent wage and price expectations.
Purchasing power parity (PPP)
The idea that exchange rates move to equalize the purchasing power of different currencies.
Renewable resources
Resources that can be replenished, such as a forest.
Rental price of capital
The cost per unit of time to rent a unit of capital.
Steady-state rate of growth
The constant growth rate of output (or output per capita) that can or will be sustained indefinitely once it is reached. Key ratios, such as the capital–output ratio, are constant on the steady-state growth path.
Total factor productivity (TFP)
A multiplicative scale factor that reflects the general level of productivity or technology in the economy. Changes in total factor productivity generate proportional changes in output for any input combination.
Administrative regulations or administrative law
Rules issued by government agencies or other regulators.