CFA Level 2 Flashcards - 1

Question

The log of the probability of an occurrence of an event or characteristic divided by the probability of the event or characteristic not occurring.

Answer 1

A method that estimates values for the intercept and slope coefficients in a logistic regression that make the data in the regression sample most likely.

Answer 2

The set of independent variables included in a model and the model's functional form.

Answer 3

When two or more independent variables are highly correlated with one another or are approximately linearly related.

Answer 4

Modeling and estimation method that uses two or more independent variables to describe the variation of the dependent variable. Also referred to as multiple regression.

Answer 5

A situation in which residuals are negatively related to other residuals.

Answer 6

Models in which one regression model has a subset of the independent variables of another regression model.

Answer 7

A visual used to compare the distribution of the residuals from a regression to a theoretical normal distribution.

Answer 8

Bias resulting from the omission of an important independent variable from a regression model.

Answer 9

An observation that has an extreme value of the dependent variable and is potentially influential.

Answer 10

Situation in which the model has too many independent variables relative to the number of observations in the sample, such that the coefficients on the independent variables represent noise rather than relationships with the dependent variable.

Answer 11

Coefficient that describes the effect of a one-unit change in the independent variable on the dependent variable, holding all other independent variables constant. Also known as partial slope coefficient.

Answer 12

A situation in which residuals are positively related to other residuals.

Answer 13

A dependent variable that is discrete (binary). Also known as a categorical dependent variable.

Answer 14

A regression model with a subset of the complete set of independent variables.

Answer 15

Method for correcting residuals for conditional heteroskedasticity. Also known as heteroskedasticity-consistent standard errors or White-corrected standard errors.

Answer 16

A visualization technique that shows the scatterplots between different sets of variables, often with the histogram for each variable on the diagonal. Also referred to as a pairs plot.

Answer 17

A statistic used to compare sets of independent variables for explaining a dependent variable. It is preferred for finding the model with the best goodness of fit.

Answer 18

A condition found most often in time series in which residuals are correlated across observations. Also known as autocorrelation.

Answer 19

Method for correcting serial correlation. Also known as serial correlation and heteroskedasticity adjusted standard errors, Newey–West standard errors, and robust standard errors.

Answer 20

An indicator variable that allows a single regression model to estimate two lines of best fit, each with differing slopes, depending on whether the dummy takes a value of 1 or 0.

Answer 21

A t-distributed statistic that is used to detect outliers.

Answer 22

When heteroskedasticity of the error variance is not correlated with the regression's independent variables.

Answer 23

A regression model with the complete set of independent variables.

Answer 24

A statistic that quantifies the degree of multicollinearity in a model.

Answer 25

The correlations of a time series with its own past values.

Answer 26

A time series regressed on its own past values in which the independent variable is a lagged value of the dependent variable.

Answer 27

A forecasting process in which the next period's value as predicted by the forecasting equation is substituted into the right-hand side of the equation to give a predicted value two periods ahead.

Answer 28

Describes two time series that have a long-term financial or economic relationship such that they do not diverge from each other without bound in the long run.

Answer 29

Describes a time series when its expected value and variance are constant and finite in all periods and when its covariance with itself for a fixed number of periods in the past or future is constant and finite in all periods.

Answer 30

The autocorrelations of the error term.

Answer 31

A transformation that subtracts the value of the time series in period t – 1 from its value in period t.

Answer 32

The property of having a nonconstant variance; refers to an error term with the property that its variance differs across observations.

Answer 33

The property of having a constant variance; refers to an error term that is constant across observations.

Answer 34

The residuals from a fitted time-series model within the sample period used to fit the model.

Answer 35

The correlation between observations in a time series separated by k periods.

Answer 36

A trend in which the dependent variable changes at a constant rate with time.

Answer 37

With reference to time-series models, a model in which the growth rate of the time series as a function of time is constant.

Answer 38

The tendency of a time series to fall when its level is above its mean and rise when its level is below its mean; a mean-reverting time series tends to return to its long-term mean.

Answer 39

The average of the current and immediately prior n – 1 values of a time series.

Answer 40

The differences between actual and predicted values of time series outside the sample period used to fit the model.

Answer 41

A time series in which the value of the series in one period is the value of the series in the previous period plus an unpredictable random error.

Answer 42

With reference to a time series, the underlying model generating the times series.

Answer 43

The sample autocorrelations of the residuals.

Answer 44

The square root of the average squared forecast error; used to compare the out-of-sample forecasting performance of forecasting models.

Answer 45

A characteristic of a time series in which the data experience regular and predictable periodic changes; for example, fan sales are highest during the summer months.

Answer 46

A set of observations on a variable's outcomes in different time periods.

Answer 47

A long-term pattern of movement in a particular direction.

Answer 48

A time series that is not covariance stationary is said to have a unit root.

Answer 49

A functional part of a neural network's node that transforms the total net input received into the final output of the node. The activation function operates like a light dimmer switch that decreases or increases the strength of the input.

Answer 50

A bottom-up hierarchical clustering method that begins with each observation being treated as its own cluster. The algorithm finds the two closest clusters, based on some measure of distance (similarity), and combines them into one new larger cluster. This process is repeated iteratively until all observations are clumped into a single large cluster.

Answer 51

The process of adjusting weights in a neural network, to reduce total error of the network, by moving backward through the network's layers.

Answer 52

Model error due to randomness in the data.

Answer 53

Describes the degree to which a model fits the training data. Algorithms with erroneous assumptions produce high bias error with poor approximation, causing underfitting and high in-sample error.

Answer 54

A technique whereby the original training dataset is used to generate n new training datasets or bags of data. Each new bag of data is generated by random sampling with replacement from the initial training set.

Answer 55

The center of a cluster formed using the k-means clustering algorithm.

Answer 56

A supervised machine learning technique that can be applied to predict either a categorical target variable, producing a classification tree, or a continuous target variable, producing a regression tree. CART is commonly applied to binary classification or regression.

Answer 57

A subset of observations from a dataset such that all the observations within the same cluster are deemed "similar."

Answer 58

The sorting of observations into groups (clusters) such that observations in the same cluster are more similar to each other than they are to observations in other clusters.

Answer 59

A term referring to the number of features, parameters, or branches in a model and to whether the model is linear or non-linear (non-linear is more complex).

Answer 60

A variable that combines two or more variables that are statistically strongly related to each other.

Answer 61

A technique for estimating out-of-sample error directly by determining the error in validation samples.

Answer 62

Machine learning using neural networks with many hidden layers.

Answer 63

Neural networks with many hidden layers—at least 2 but potentially more than 20—that have proven successful across a wide range of artificial intelligence applications.

Answer 64

A type of tree diagram used for visualizing a hierarchical cluster analysis; it highlights the hierarchical relationships among the clusters.

Answer 65

A set of techniques for reducing the number of features in a dataset while retaining variation across observations to preserve the information contained in that variation.

Answer 66

A top-down hierarchical clustering method that starts with all observations belonging to a single large cluster. The observations are then divided into two clusters based on some measure of distance (similarity). The algorithm then progressively partitions the intermediate clusters into smaller ones until each cluster contains only one observation.

Answer 67

A measure that gives the proportion of total variance in the initial dataset that is explained by each eigenvector.

Answer 68

A vector that defines new mutually uncorrelated composite variables that are linear combinations of the original features.

Answer 69

A technique of combining the predictions from a collection of models to achieve a more accurate prediction.

Answer 70

The method of combining multiple learning algorithms, as in ensemble learning.

Answer 71

The independent variables (X's) in a labeled dataset.

Answer 72

A curve which shows in- and out-of-sample error rates (E in and E out) on the y-axis plotted against model complexity on the x-axis.

Answer 73

The process of adjusting weights in a neural network, to reduce total error of the network, by moving forward through the network's layers.

Answer 74

When a model retains its explanatory power when predicting out-of-sample (i.e., using new data).

Answer 75

An iterative unsupervised learning procedure used for building a hierarchy of clusters.

Answer 76

Data samples that are not used to train a model.

Answer 77

A parameter whose value must be set by the researcher before learning begins.

Answer 78

A metric which quantifies the amount of information that the feature holds about the response. Information gain can be regarded as a form of non-linear correlation between Y and X.

Answer 79

A technique in which data (excluding test sample and fresh data) are shuffled randomly and then are divided into k equal sub-samples, with k – 1 samples used as training samples and one sample, the kth, used as a validation sample.

Answer 80

A clustering algorithm that repeatedly partitions observations into a fixed number, k, of non-overlapping clusters.

Answer 81

A supervised learning technique that classifies a new observation by finding similarities ("nearness") between this new observation and the existing data.

Answer 82

Least absolute shrinkage and selection operator is a type of penalized regression which involves minimizing the sum of the absolute values of the regression coefficients. LASSO can also be used for regularization in neural networks.

Answer 83

A dataset that contains matched sets of observed inputs or features (X's) and the associated output or target (Y).

Answer 84

A curve that plots the accuracy rate (= 1 – error rate) in the validation or test samples (i.e., out-of-sample) against the amount of data in the training sample, which is thus useful for describing under- and overfitting as a function of bias and variance errors.

Answer 85

A parameter that affects the magnitude of adjustments in the weights in a neural network.

Answer 86

A binary classifier that makes its classification decision based on a linear combination of the features of each data point.

Answer 87

A classifier that assigns to a new data point the predicted label with the most votes (i.e., occurrences).

Answer 88

Computer programs based on how our own brains learn and process information.

Answer 89

A regression that includes a constraint such that the regression coefficients are chosen to minimize the sum of squared residuals plus a penalty term that increases in size with the number of included features.

Answer 90

An unsupervised ML technique used to transform highly correlated features of data into a few main, uncorrelated composite variables.

Answer 91

The vertical (perpendicular) distance between a data point and a given principal component.

Answer 92

A regularization technique used in CART to reduce the size of the classification or regression tree—by pruning, or removing, sections of the tree that provide little classifying power.

Answer 93

A collection of a large number of decision trees trained via a bagging method.

Answer 94

A term that describes methods for reducing statistical variability in high-dimensional data estimation problems.

Answer 95

Machine learning in which a computer learns from interacting with itself or data generated by the same algorithm.

Answer 96

A plot that shows the proportion of total variance in the data explained by each principal component.

Answer 97

An adaptation in the support vector machine algorithm which adds a penalty to the objective function for observations in the training set that are misclassified.

Answer 98

A functional part of a neural network's node that multiplies each input value received by a weight and sums the weighted values to form the total net input, which is then passed to the activation function.

Answer 99

A machine learning approach that makes use of labeled training data.

Answer 100

A linear classifier that determines the hyperplane that optimally separates the observations into two sets of data points.

Answer 101

In machine learning, the dependent variable (Y) in a labeled dataset; the company in a merger or acquisition that is being acquired.

Answer 102

A data sample that is used to test a model's ability to predict well on new data.

Answer 103

A data sample that is used to train a model.

Answer 104

A machine learning approach that does not make use of labeled training data.

Answer 105

A data sample that is used to validate and tune a model.

Answer 106

Describes how much a model's results change in response to new data from validation and test samples. Unstable models pick up noise and produce high variance error, causing overfitting and high out-of-sample error.

Answer 107

The percentage of correctly predicted classes out of total predictions. It is an overall performance metric in classification problems.

Answer 108

A set of well-defined methods of communication between various software components and typically used for accessing external data.

Answer 109

A collection of a distinct set of tokens from all the texts in a sample dataset. BOW does not capture the position or sequence of words present in the text.

Answer 110

A systematic process of evaluating different components in the pipeline of model building. It helps to understand what part of the pipeline can potentially improve in performance by further tuning.

Answer 111

The number of times a given word appears in the whole corpus (i.e., collection of sentences) divided by the total number of words in the corpus.

Answer 112

A grid used for error analysis in classification problems, it presents values for four evaluation metrics including true positive (TP), false positive (FP), true negative (TN), and false negative (FN).

Answer 113

A collection of text data in any form, including list, matrix, or data table forms.

Answer 114

The process of examining, identifying, and mitigating (i.e., cleansing) errors in raw data.

Answer 115

This task performs transformations and critical processing steps on cleansed data to make the data ready for ML model training (i.e., preprocessing), and includes dealing with outliers, extracting useful variables from existing data points, and scaling the data.

Answer 116

The number of documents (texts) that contain a particular token divided by the total number of documents. It is the simplest feature selection method and often performs well when many thousands of tokens are present.

Answer 117

A matrix where each row belongs to a document (or text file), and each column represents a token (or term). The number of rows is equal to the number of documents (or text files) in a sample text dataset. The number of columns is equal to the number of tokens from the BOW built using all the documents in the sample dataset. The cells typically contain the counts of the number of times a token is present in each document.

Answer 118

The preliminary step in data exploration, where graphs, charts, and other visualizations (heat maps and word clouds) as well as quantitative methods (descriptive statistics and central tendency measures) are used to observe and summarize data.

Answer 119

The harmonic mean of precision and recall. F1 score is a more appropriate overall performance metric (than accuracy) when there is unequal class distribution in the dataset and it is necessary to measure the equilibrium of precision and recall.

Answer 120

A process of creating new features by changing or transforming existing features.

Answer 121

A process whereby only pertinent features from the dataset are selected for model training. Selecting fewer features decreases model complexity and training time.

Answer 122

The process of quantifying how important tokens are in a sentence and in the corpus as a whole. It helps in filtering unnecessary tokens (or features).

Answer 123

A method of systematically training a model by using various combinations of hyperparameter values, cross validating each model, and determining which combination of hyperparameter values ensures the best model performance.

Answer 124

The known outcome (i.e., target variable) of each observation in a labelled dataset.

Answer 125

Data that describes and gives information about other data.

Answer 126

Measures how much information is contributed by a token to a class of texts. MI will be 0 if the token's distribution in all text classes is the same. MI approaches 1 as the token in any one class tends to occur more often in only that particular class of text.

Answer 127

A representation of word sequences. The length of a sequence varies from 1 to n. When one word is used, it is a unigram; a two-word sequence is a bigram; and a 3-word sequence is a trigram; and so on.

Answer 128

An algorithm that analyzes individual tokens and their surrounding semantics while referring to its dictionary to tag an object class to the token.

Answer 129

The process by which categorical variables are converted into binary form (0 or 1) for machine reading. It is one of the most common methods for handling categorical features in text data.

Answer 130

An algorithm that uses language structure and dictionaries to tag every token in the text with a corresponding part of speech (i.e., noun, verb, adjective, proper noun, etc.).

Answer 131

In error analysis for classification problems it is ratio of correctly predicted positive classes to all predicted positive classes. Precision is useful in situations where the cost of false positives (FP), or Type I error, is high.

Answer 132

Text files provided with raw data that contain information related to a data file. They are useful for understanding the data and how they can be interpreted correctly.

Answer 133

Also known as sensitivity, in error analysis for classification problems it is the ratio of correctly predicted positive classes to all actual positive classes. Recall is useful in situations where the cost of false negatives (FN), or Type II error, is high.

Answer 134

A series of texts that contains characters in a particular order. Regex is used to search for patterns of interest in a given text.

Answer 135

The process of adjusting the range of a feature by shifting and changing the scale of the data. Two of the most common ways of scaling are normalization and standardization.

Answer 136

The number of characters, including spaces, in a sentence.

Answer 137

Ratio of the number of times a given token occurs in all the texts in the dataset to the total number of tokens in the dataset.

Answer 138

The equivalent of a word (or sometimes a character).

Answer 139

The process of representing ownership rights to physical assets on a blockchain or distributed ledger.

Answer 140

Also called truncation, it is the process of removing extreme values and outliers from a dataset.

Answer 141

Programs that extract raw content from a source, typically web pages.

Answer 142

The process of replacing extreme values and outliers in a dataset with the maximum (for large value outliers) and minimum (for small value outliers) values of data points that are not outliers.

Answer 143

An extension of the law of one price whereby the prices of goods and services will not differ internationally once exchange rates are considered.

Answer 144

The relationship among the spot exchange rate, the forward exchange rate, and the interest rates in two currencies that ensures that the return on a hedged (i.e., covered) foreign risk-free investment is the same as the return on a domestic risk-free investment. Also called interest rate parity.

Answer 145

The hypothesis that expected changes in the spot exchange rate are equal to expected differences in national inflation rates. An extension of relative purchasing power parity to expected future changes in the exchange rate.

Answer 146

An investment strategy that involves taking long positions in high-yield currencies and short positions in low-yield currencies.

Answer 147

The proposition that the forward exchange rate is an unbiased predictor of the future spot exchange rate.

Answer 148

The proposition that nominal interest rate differentials across currencies are determined by expected inflation differentials.

Answer 149

A theory of exchange rate determination that emphasizes the portfolio investment decisions of global investors and the requirement that global investors willingly hold all outstanding securities denominated in each currency at prevailing prices and exchange rates.

Answer 150

The proposition that real interest rates will converge to the same level across different markets.

Answer 151

The hypothesis that changes in (nominal) exchange rates over time are equal to national inflation rate differentials.

Answer 152

An arbitrage transaction involving three currencies that attempts to exploit inconsistencies among pairwise exchange rates.

Answer 153

The proposition that the expected return on an uncovered (i.e., unhedged) foreign currency (risk-free) investment should equal the return on a comparable domestic currency investment.

Answer 154

The idea that developing countries, regardless of their particular characteristics, will eventually catch up with the developed countries and match them in per capita output.

Answer 155

An increase in the capital-to-labor ratio.

Answer 156

The idea that only rich and middle-income countries sharing a set of favorable attributes (i.e., are members of the "club") will converge to the income level of the richest countries.

Answer 157

A function of the form Y = Kα L1–α relating output (Y) to labor (L) and capital (K) inputs.

Answer 158

The idea that convergence of per capita income is conditional on the countries having the same savings rate, population growth rate, and production function.

Answer 159

The condition that if all inputs into the production process are increased by a given percentage, then output rises by that same percentage.

Answer 160

When each additional unit of an input, keeping the other inputs unchanged, increases output by a smaller increment.

Answer 161

A situation in which currency appreciation driven by strong export demand for resources makes other segments of the economy (particularly manufacturing) globally uncompetitive.

Answer 162

The production function written in the form of growth rates. For the basic Cobb–Douglas production function, it states that the growth rate of output equals the rate of technological change plus α multiplied by the growth rate of capital plus (1 – α) multiplied by the growth rate of labor.

Answer 163

Everyone of working age (ages 16 to 64) who either is employed or is available for work but not working.

Answer 164

The percentage of the working age population that is in the labor force.

Answer 165

The quantity of goods and services (real GDP) that a worker can produce in one hour of work.

Answer 166

States that potential GDP growth equals the growth rate of the labor input plus the growth rate of labor productivity.

Answer 167

The impact that users of a good, a service, or a technology have on other users of that product; it can be positive (e.g., a critical mass of users makes a product more useful) or negative (e.g., congestion makes the product less useful).

Answer 168

A situation in which a country remains relatively poor, or even falls further behind, because it fails to implement necessary institutional reforms and/or adopt leading technologies.

Answer 169

Finite resources that are depleted once they are consumed; oil and coal are examples.

Answer 170

The maximum amount of output an economy can sustainably produce without inducing an increase in the inflation rate. The output level that corresponds to full employment with consistent wage and price expectations.

Answer 171

The idea that exchange rates move to equalize the purchasing power of different currencies.

Answer 172

Resources that can be replenished, such as a forest.

Answer 173

The cost per unit of time to rent a unit of capital.

Answer 174

The constant growth rate of output (or output per capita) that can or will be sustained indefinitely once it is reached. Key ratios, such as the capital–output ratio, are constant on the steady-state growth path.

Answer 175

A multiplicative scale factor that reflects the general level of productivity or technology in the economy. Changes in total factor productivity generate proportional changes in output for any input combination.

Answer 176

Rules issued by government agencies or other regulators.

CFA Level 2 Flashcards - 1

(200 cards)