Quantitative Methods Flashcards

Question

Negative Serial Correlation

Answer 1

Negative residual is most likely followed by positive residual Positive residual is most likely followed by negative residual

Answer 2

2 or more independent variables are highly correlated or there is an approximate linear relationship among the IVs. Coefficients will be consistent but imprecise and unreliable Inflated SE and insignificant T-Statistics, but possibly significant F-Statistics

Answer 3

Variance inflation factor 1 / (1- R Square) We want VIF as low as possible > 5 Concerning > 10 Multicollinearity

Answer 4

* Increase sample size * Excluding one or more of the regression variables. * Use a different proxy for one of the variables

Answer 5

AIC = n * ln(SSE/n)+ 2(K+1) AIC is better for forecasting purposes

Answer 6

BIC = n * ln(SSE/n) + Ln(n)(k+1) Better for evaluating goodness-of-fit

Answer 7

F-Stat [(SSE restricted - SSE unrestricted) / q] / (SSE unrestricted / N-k-1)

Answer 8

Extreme value of independent variables Observation that is outside the range of independent variables (x axis)

Answer 9

Extreme value in the dependent variable Observation that is outside the range of the dependent variables (vertical Y range)

Answer 10

Calculate leverage measure **HL = 3 (K+1/n)** 1/n + ( Deviation of i / Sum of all deviations)

Answer 11

****Externally studentized residuals - Delete each case i - Calculate new regression - Add deleted observation back in, calculate residual - Calculate sudentized residuals T* = e* / se* **potentially influentia if** .. |T*|> Critical t (for small samples) |T*| < 3 for large samples

Answer 12

By calculating Cooks distance (aka Cooks D)

Answer 13

could be influential

Answer 14

Likely to be influential

Answer 15

Influential

Answer 16

No interaction term yi = b0 + b1x1 +b2x2 + d0D1

Answer 17

interaction term yi = b0 + b1x1 +d1x1D + epsilon

Answer 18

log odds that the event happens per unit change in the independent variable, holding all other independent variables constant.

Answer 19

log odds of the ETF being a winning fund if all independent variables are zero.

Answer 20

When the dependent Y variable changes at a constant growth rate

Answer 21

When the dependent Y variable changes at a constant rate with time.

Answer 22

H0: Dw = 2 - Fail to reject - Do not reject the null hypothesis - No Serial correlation Ha: Dw =/2 -Reject null - We have serial correlation

Answer 23

A time series regressed on its own past values. A statistical model is autoregressive if it predicts future values based on past values. For example, an autoregressive model might seek to predict a stock's future prices based on its past performance.

Answer 24

Mean, Variance, and Cov(yt, yt-s) must be constant and finite in all periods. 1. The expected value of the time series must be constant and finite in all periods. 2. The variance of the time series must be constant and finite in all periods. 3. The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all period

Answer 25

The value of the time series falls when it's above its mean, and rises when it's below its mean. Mean reversion in finance suggests that various relevant phenomena such as asset prices and volatility of returns eventually revert to their long-term average levels. The mean reversion theory has led to many investment strategies, from stock trading techniques to options pricing models. Mean reversion trading tries to capitalize on extreme changes in the price of a particular security, assuming that it will revert to its previous state

Answer 26

The time series will decrease

Answer 27

The time series will remain the same

Answer 28

The time series will remain the increase

Answer 29

Prediction Predicted vs Observed values to generate the model Models with a smaller variance of errors are more accurate

Answer 30

Forecast Forecast vs Outside the model's values Use Root Mean Squared Errors (RMSE) - used to compute out-of-sample forecasting performance. The smaller the RMSE, the better.

Answer 31

Finite mean reverting level, and finite variance

Answer 32

Dickey-Fuller test

Answer 33

No unit root - the time series is covariance stationary

Answer 34

Unit root. Time series is a random walk. It is not covaraince stationary

Answer 35

**Evidence of Positive Serial Correltion** we can reject the hypothesis of no Positive Serial correlation

Answer 36

**NO Evidence of Positive Serial Correltion**

Answer 37

|T-Stat| > Critical Value

Answer 38

|T-Stat| < Critical Value

Answer 39

1/√T **where T represents the number of observations used in the regression**

Answer 40

The DW statistic is designed to detect positive serial correlation of the errors of a regression equation. Under the null hypothesis of no positive serial correlation, the DW statistic is 2.0. Positive serial correlation will lead to a DW statistic that is less than 2.0. We do NOT want positive serial correlation !!!

Answer 41

The steps to calculate RMSE are as follows: 1. Take the difference between the actual and the forecast values. This is the error. 2. Square the error. 3. Sum the squared errors. 4. Divide by the number of forecasts. 5. Take the square root of the average.

Answer 42

Root Mean Square errors steps 1. Square the errors (Actual - Forecasted) 2. Sum of the differences and calculate the mean 3. Take the square root of the mean 4. Then we will have our RMSE - Smaller RMSE the better A model’s accuracy in forecasting out-of-sample values is assessed using the root mean squared error (RMSE). RMSE is the square root of the mean squared error. The model with the smallest RMSE is seen as the most accurate, as it is perceived to have better predictive power in the future.

Answer 43

Is a stochastic trend in a time series Random Walk with a drift If TS has unit root, it shows a systamatic pattern that is unpredicable

Answer 44

By using first differencing Regression 2:yt = b0 + b1yt−1 + εt, where yt = xt − xt−1.

Answer 45

NO!!! DW can be used for linear models, not trend models

Answer 46

0 -------|dl|--------|du|-------2 Between 0 and lower level = + SC Between Du and 2 = Okay Between Dl and Du = We don't know

Answer 47

1 - [(n-1)/(n-k-1)] x (1-R Squared)

Answer 48

The number of observation

Answer 49

Conditional Heteroskedasticity

Answer 50

**Serial correlation** Trend models often have the limitation that their errors are serially correlated. This is due to the fact that predictions in the trend models are based soley on what time period it is, and thus they fail to account for significant trends in the data such as recession.

Answer 51

Classifying unlebeled data

Answer 52

Involves training an algorithm to take a set of inputs (x variables) and find a model that best relates them to outputs (Y variables) Training algorithm - Set of inputs - find models that relates to outputs.

Answer 53

Same as supervised learning, but does not make use of labeled training data. We give it data and expect the algorithm to make sense of it.

Answer 54

ML models can produce overly complex models that may fit the training data too well and thereby not generalize new data well. The prediction model of the traning sample (in-Sample data) is too complex. The traning Data does not work well with the new data

Answer 55

Penalized regression Support Vector Machine (SVM) K - Nearest Neighbor Classification and Regression Trees (CART) Ensemble learning Random Forest

Answer 56

Principle component analysis K-Mean clustering Hierarachical clustering

Answer 57

High Bias Error means the model does not fit the training data well.

Answer 58

High variance error means the model does not predict well on the test data

Answer 59

Principle component analysis (unsupervised ML) Penalized Regression (Supervised ML)

Answer 60

* Simmilar to maximizing adjusted R square. * Demension Reduction * Eliminates/minimazie overfitting Regression coefficients are chosen to minimize the sum of the squared error, plus a penalty term that increases with the number of included features

Answer 61

**Support Vector Machine** **It is Classification, Regression, and Outlier detection** Classifying data that is not complex or non-linear. Is a linear classifier that determines the hyperplane that optimally seperates the observation into two sets of data points. Does not requier any hyperparameter. Maximize the probability of making a correct prediction by determining the boundry that is furthest from all observation. Outliers do not affect either the support vectors or the discriminant boundry.

Answer 62

Classification Classify new observation by finding similarities in the existing data. Makes no assumption about the distribution of the data. It is non-parametric. KNN results can be sensitive to inclusion of irrelevant or correlated featuers, so it may be neccessary to select featuers manually. Thereby removing less irrelevant information.

Answer 63

**Classification and Regression Trees** Part of supervised ML Typically applied when the target is binary. If the goal is regression, the prediction would be the mean of the values of the terminal node. Makes no assumption about the characteristics of the traning data, so if left unconstrained, potentially it can perfectly learn the traning data. To avoid overfitting, regulation paramterers can be added, such as the maximum dept of the tree.

Answer 64

1. Input layer 2. Hidden layer 3. Output layer

Answer 65

Variance error and overfitting

Answer 66

Bias error and underfitting

Answer 67

The groups in clustering are determined by the data Classification they are determined by the analyst/researcher

Answer 68

K-means partitions observations into a fixed number, k, of non-overlaping cluster. Each cluster is characterized by its centroid, and each observation is assigned by the algorithm to the cluster with the centroid to which that observation is closest.

Answer 69

Underfitting High bias error = model does not fit on the traning data. High variance = Model does not predict well on test data. Both combination results in a underfitted model.

Answer 70

Overfitting Bias error = model does not fit the traning data well. Variance error = Model does not predict well on test data.

Answer 71

Bias Error (underfitting)

Answer 72

Variance Error (overfitting)

Answer 73

It is part of unsupervised ML Dimension Reduction Use to reduce highly correlaed featuers of data into few main uncorrelated composite variables.

Answer 74

**Conceptualize the task** -> **Collect data** -> **Data Preperation & processing** -> **Data Exploration** -> **Model traning.**

Answer 75

T**ext probelm formulation** -> **Data Curation** -> **Text preperation and processing** -> **Text exploration** -> **Classifier output**.

Answer 76

Creating a new variable from an already existing one for easing the analysis. **Example**: Date of birth -> Age

Answer 77

2 or more variables aggregated into one signle variable.

Answer 78

Eliminate data rows which are not needed. [We filter out the information that is not relevant] CFA Lv 2 Candidates only

Answer 79

Columns that can be eliminated

Answer 80

Nominal, ordinal, integer, ratio, categorical.

Answer 81

Missing entries

Answer 82

Outside a meaningful range

Answer 83

Some data conflicts with other data.

Answer 84

Not a true value

Answer 85

Non identical data format American date (M/D/Y) vs European (D/M/Y)

Answer 86

Multiple identical observation

Answer 87

**Rescales in the rage 0-1** Sensitive to outliers. Xi- Xmin /(Range) Xi- Xmin /(Xmax -Xmin)

Answer 88

Centers and Rescales Requiers normal distribution (Xi - u) / Standard deviation

Answer 89

**P**= TP / (TP + FP) **Remeber**: Demoninator ( Positive) Useful when type 1 error is high is the ratio of correctly predictive positive classes to all predictive positive classes. Precision is useful in situations where the cost of FP or Type I Error is high. For example, when an expensive product fails quality inspection (predicted class 1) and is scrapped, but it is actually perfectly good (actual class 0).

Answer 90

TP / / (TP + FN) **Remember**: ( Recall we have the opposite in the denominator) **Sensitivity**: useful when type 2 error is high. also known as sensitivity i.e. is the ratio of correctly predicted positive classes to all actual positive classes. Recall is useful in situations where the cost of FN or Type II Error is high. For example, when an expensive product passes quality inspection (predicted class 0) and is sent to the valued customer, but it is actually quite defective (actual class 1)

Answer 91

(TP + TN) / (TP + FN + TN + FP) Is the percentage of correctly predicted classes out of total predictions.

Answer 92

FP / (FP + TN) Statement / (Statement + Opposite)

Answer 93

TP / (TP + FN) Statement / (Statement + Opposite)

Answer 94

RMSE (Root Mean Square Error)

Answer 95

Removing the bottom and top 1% of observation on a feature in a data set.

Answer 96

Replacing the extreme values in a data set with the same maximum or minumimum value

Answer 97

(2 x P x R) / (P + R) is the harmonic mean of precision and recall. F1 Score is more appropriate than Accuracy when unequal class distribution is in the dataset andit is necessary to measure the equilibrium of Precision and Recall. High scores on both of these metrices suggest good model performance.

Answer 98

TP FP FN TN

Answer 99

**How much info a token contributes to a class** Mutual Information (MI) Measures how much information is contributed by a token to a class of text. **MI = 0** The token's distribution in all text classes is the same. **MI = 1** The token in any one class tends to occure more often in only that particular class of text.

Answer 100

Final stage in Data Exploration **Numbers**: Differentitate among types of numbers **N-Grams**: Multi-Word patterns kept intact **Name entity recognition (NER)**: Class: Money, Time, Organization.

Answer 101

The majority class can be under-sampled and the minority class can be over-sampled.

Answer 102

Splitting a givien text into seperate words or characters. Token is equvulant to a word, and tokenization is the process of splitting the word into seperate tokens.

Answer 103

Tokens -> N-grams which to build a bag -> Input to a document term matrix.

Answer 104

**Volume**: refers to the quantity of data. **Variety**: pertains to the array of available data sources. **Velocity:** is the speed at which data is created (data in motion is hard to analyze compared to data at rest). **Veracity**: related to the credibility and reliability of different data sources.

Answer 105

**Stage 4 and first stage in Data Exploration** is the preliminary step in data exploration. Exploratory graphs, charts and other visualizations such as heat maps and word clouds are designed to summarize and observe data.

Answer 106

**Stage 4 and Second stage in Data Exploration** is a process whereby only pertinent features from the dataset are selected for ML model training. Feature

Answer 107

**Stage 4 and Third and final stage in Data Exploration** is a process of creating new features by changing or transforming existing features. Feature Engineering techniques systematically alter, decompose or combine existing features to produce more meaningful features.

Answer 108

MSR / MSE [ RSS / K ] / [SSE / n-(k+1) ] [ Regression / K ] / [ Residual / n-(k+1) ]

Answer 109

F test > F stat : Reject null. b1 = b2 = bn = 0 F test < F stat : Fail to reject null. b1 =/ b2 =/ bn =/ 0

Answer 110

Bias error Variance error Base error

Answer 111

Variance Error or how much the model's results change in response to new data from validation and test samples. Unstable models pick up noise and produce high variance causing overfitting and ↑ out of-sample error.

Answer 112

Bias Error or the degree to which a model fits the training data. Algorithms with erroneous assumptions produce high bias with poor approximation, causing underfitting and ↑ in-sample error. (Adding more training samples will not improve the model)

Answer 113

Base Error due to randomness in the data. (Out-of-sample accuracy increases as the training sample size increases)

Answer 114

Ocean's Razor: The problem solving principle that the simplest solution tends to be the correct one. In supervised ML, it means preventing the algorithm from getting too complex during selection and training by limiting the no. of features and penalizing algorithms that are too complex or too flexible by constraining them to include only parameters that reduce out-of-sample error. K-Fold Cross Validation: This strategy comes from the principle of avoiding sampling bias. The challenge is having a large enough data set to make both training and testing possible on representative samples.

Quantitative Methods Flashcards

(138 cards)