Quantitative Methods Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

When should you use Logistic regression models?

A

If the dependent Y variable is discrete
If out independent X variables is qualitative

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

When should you use Multiple regression models?

A

When the dependent variable is continuous (not discrete) and there is more than one explanatory variable (more than one dependent variable).

When multiple independent variables determine the outcome of a single dependent variable.

  • Dependent Y Variable is continuous
  • We have more than 1 Dependent Y variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Assumption of Regression models

A

L.I.I.N.H.

Linearity: Relationship between dependent Y variable and Independent X variable is linear.

Independent of Errors: Regression residuals are uncorrelated across observation.

Independent: Independent X variable is not random, there is no exact linear relationship between 2 or more independent variables.

Normality: Regression residuals are normally distributed.

Homoscedasticity: Constant variance of regression residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How to determine if a variable is significant?

A

|T-Stat| > 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Degrees of freedom for SSR

A

N-k

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Degrees of freedom for SST

A

N-1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Degrees of freedom for SSE

A

N-K+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What will happen to adjusted R-Square if we have insignificant varibles

A

Adjusted R-Square decreases

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

R-Square formula

A

SSR/SST = Explained Variation / Unexplained variation

1-(unexplained variation/total variation)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What kind of test is this?

H0: bi = Bi
Ha: bi /= Bi

A

Two tail test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What kind of test is this?

H0: bi <= Bi
Ha: bi > Bi

A

Right tail test

<= - is heading right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What kind of test is this?

H0: bi => Bi
Ha: bi < Bi

A

Left tail test

=> is heading left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Model Misspecification - Omitted variable

A

If we omit a significant variable from our model, the error term will capture the missing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Model Misspecification - Inappropriate form of variable

A

Failing to account for non-linearity
Causes: Conditional heteroscedasticity

To fix it we can use natural log to transform the variable to be linear.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Model Misspecification - Inappropriate Scaling

A

Causes Conditional heteroscedasticity and multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Model Misspecification - Inappropriate Pooling of Data

A

Causes Conditional heteroscedasticity and Serial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is Unconditional heteroscedasticity

A

Var(error) not correlated with independent variable.
No issue with interference.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What is Conditional heteroscedasticity

A

Var(error) are correlated with independent X variable

F-test is unreliable since MSE is a biased estimator of the true population variance.

variance at one time step has a positive relationship with variance at one or more previous time steps. This implies that periods of high variability will tend to follow periods of high variability and periods of low variability will tend to follow periods of low variability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What does the Breusch Pagan BP tets do?

A

Tests for heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

The formula for BP test statistics

A

n * R-Square

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

BP test
Test statistics > Critical value

A

Reject the null.
No heteroskedasticity
homoskedasticity is present -* Constant vartiance *

  • H0: No heteroskedasticity - homoskedasticity is present
  • Ha: Heteroskedasticity
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

BP test
Test statistics < Critical value

A

Reject the null

There is Heteroskedasticity

H0: No heteroskedasticity
Ha: Heteroskedasticity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What is serial correlation?

A

Errors correlated across the observation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Positive Serial Correlation

A

Positive residuals is most likely followed by positive residuals
Negative residuals is most likely followed by negative residuals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Negative Serial Correlation

A

Negative residual is most likely followed by positive residual
Positive residual is most likely followed by negative residual

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Multicollinearity

A

2 or more independent variables are highly correlated or there is an approximate linear relationship among the IVs.

Coefficients will be consistent but imprecise and unreliable
Inflated SE and insignificant T-Statistics, but possibly significant F-Statistics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

How to detect multicollinearity?

A

Variance inflation factor

1 / (1- R Square)

We want VIF as low as possible

> 5 Concerning
10 Multicollinearity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How to fix multicollinearity?

A
  • Increase sample size
  • Excluding one or more of the regression variables.
  • Use a different proxy for one of the variables
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Formula and purpose of AIC

A

AIC = n * ln(SSE/n)+ 2(K+1)

AIC is better for forecasting purposes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Formula and purpose of BIC

A

BIC = n * ln(SSE/n) + Ln(n)(k+1)

Better for evaluating goodness-of-fit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

How do we test joint coefficients?

A

F-Stat
[(SSE restricted - SSE unrestricted) / q] / (SSE unrestricted / N-k-1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

What is a High leverage point?

A

Extreme value of independent variables

Observation that is outside the range of independent variables (x axis)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

What is a Outliers?

A

Extreme value in the dependent variable

Observation that is outside the range of the dependent variables (vertical Y range)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

How do you detect and calculate a High leverage point?

A

Calculate leverage measure

HL = 3 (K+1/n)

1/n + ( Deviation of i / Sum of all deviations)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

How do you detect and calculate a outlier?

A

**Externally studentized residuals

  • Delete each case i
  • Calculate new regression
  • Add deleted observation back in, calculate residual
  • Calculate sudentized residuals

T* = e* / se*

potentially influentia if ..
|T|> Critical t (for small samples)
|T
| < 3 for large samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

How can we determine and find influential outliers

A

By calculating Cooks distance (aka Cooks D)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

If cooks D is …

Di > 0.5

A

could be influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

If cooks D is

Di > 1

A

Likely to be influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

If cooks D is

Di > 2 x Rot(K/n)

A

Influential

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q

How does an intercept dummy variable look like?

A

No interaction term

yi = b0 + b1x1 +b2x2 + d0D1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

How does an Slope dummy variable look like?

A

interaction term

yi = b0 + b1x1 +d1x1D + epsilon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

How do you interpret an independent variable’s slope coefficient in a logistic regression model

A

log odds that the event happens per unit change in the independent variable, holding all other independent variables constant.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

The intercept in these logistic regressions is interpreted as the:

A

log odds of the ETF being a winning fund if all independent variables are zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

When to use a Log-Linear trend model?

A

When the dependent Y variable changes at a constant growth rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

When to use a Linear trend model?

A

When the dependent Y variable changes at a constant rate with time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

DW test for Serial correlation in linear/log-linear model hypothesis

A

H0: Dw = 2 - Fail to reject - Do not reject the null hypothesis - No Serial correlation
Ha: Dw =/2 -Reject null - We have serial correlation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Autoregressive AR model

A

A time series regressed on its own past values.

A statistical model is autoregressive if it predicts future values based on past values. For example, an autoregressive model might seek to predict a stock’s future prices based on its past performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

What are the 3 properties we must satisfy to have “Covariance Stationary Series”

A

Mean, Variance, and Cov(yt, yt-s) must be constant and finite in all periods.

  1. The expected value of the time series must be constant and finite in all periods.
  2. The variance of the time series must be constant and finite in all periods.
  3. The covariance of the time series with itself for a fixed number of periods in the past or future must be constant and finite in all period
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

What is “mean Reversion”

A

The value of the time series falls when it’s above its mean, and rises when it’s below its mean.

Mean reversion in finance suggests that various relevant phenomena such as asset prices and volatility of returns eventually revert to their long-term average levels.

The mean reversion theory has led to many investment strategies, from stock trading techniques to options pricing models.

Mean reversion trading tries to capitalize on extreme changes in the price of a particular security, assuming that it will revert to its previous state

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Define the Mean reverting level …

Xt > b0/(1-b1)

A

The time series will decrease

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Define the Mean reverting level …

Xt = b0/(1-b1)

A

The time series will remain the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Define the Mean reverting level …

Xt < b0/(1-b1)

A

The time series will remain the increase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

What is an “in-sample forecast”

A

Prediction
Predicted vs Observed values to generate the model

Models with a smaller variance of errors are more accurate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

What is an “out-of-sample forecast”

A

Forecast
Forecast vs Outside the model’s values

Use Root Mean Squared Errors (RMSE) - used to compute out-of-sample forecasting performance. The smaller the RMSE, the better.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
52
Q

What 2 elements does Random Walk not have?

A

Finite mean reverting level, and finite variance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
53
Q

Which test do we use to test for unit root?

A

Dickey-Fuller test

54
Q

When testing for Unit root
If the coefficient is |b1| < 1

A

No unit root - the time series is covariance stationary

55
Q

When testing for Unit root
If the coefficient is b1 = 1

A

Unit root.
Time series is a random walk.
It is not covaraince stationary

56
Q

DW Test for SC

Result from model output (DW statistics) < DW Critical

A

Evidence of Positive Serial Correltion
we can reject the hypothesis of no Positive Serial correlation

57
Q

DW Test for SC
Result from model output (DW statistics) > DW Critical

A

NO Evidence of Positive Serial Correltion

58
Q

When are residuals are not serially correlated in AR model test statistics?

A

|T-Stat| > Critical Value

59
Q

When are residuals serially correlated in AR model test statistics?

A

|T-Stat| < Critical Value

60
Q

The standard error of the autocorrelations is calculated as…

A

1/√T

where T represents the number of observations used in the regression

61
Q

Explain the DW test

A

The DW statistic is designed to detect positive serial correlation of the errors of a regression equation.

Under the null hypothesis of no positive serial correlation, the DW statistic is 2.0.

Positive serial correlation will lead to a DW statistic that is less than 2.0.

We do NOT want positive serial correlation !!!

62
Q

The steps to calculate RMSE …

A

The steps to calculate RMSE are as follows:

  1. Take the difference between the actual and the forecast values. This is the error.
  2. Square the error.
  3. Sum the squared errors.
  4. Divide by the number of forecasts.
  5. Take the square root of the average.
63
Q

Root Mean Squared Error (RMSE)

A

Root Mean Square errors steps
1. Square the errors (Actual - Forecasted)
2. Sum of the differences and calculate the mean
3. Take the square root of the mean
4. Then we will have our RMSE - Smaller RMSE the better

A model’s accuracy in forecasting out-of-sample values is assessed using the root mean squared error (RMSE).

RMSE is the square root of the mean squared error. The model with the smallest RMSE is seen as the most accurate, as it is perceived to have better predictive power in the future.

64
Q

What is a unit root

A

Is a stochastic trend in a time series

Random Walk with a drift

If TS has unit root, it shows a systamatic pattern that is unpredicable

65
Q

How do we transform TS into covariance stationary

A

By using first differencing

Regression 2:yt = b0 + b1yt−1 + εt,
where yt = xt − xt−1.

66
Q

Can we test for positive SC if we have lag variables using DW test?

A

NO!!!

DW can be used for linear models, not trend models

67
Q

When testing for serial correlation using DW test.

A

0 ——-|dl|——–|du|——-2
Between 0 and lower level = + SC
Between Du and 2 = Okay
Between Dl and Du = We don’t know

68
Q

Adjusted R square formula

A

1 - [(n-1)/(n-k-1)] x (1-R Squared)

69
Q

Holding all other variables constant, the adjusted R-Square will decrease when all of the following variables increase expect…

A

The number of observation

70
Q

What does the BP test for ?

A

Conditional Heteroskedasticity

71
Q

What is the most common problem in trend models?

A

Serial correlation

Trend models often have the limitation that their errors are serially correlated. This is due to the fact that predictions in the trend models are based soley on what time period it is, and thus they fail to account for significant trends in the data such as recession.

72
Q

Hierarchical clustering is most likely used when the problem involves

A

Classifying unlebeled data

73
Q

What is Supervised machine learning

A

Involves training an algorithm to take a set of inputs (x variables) and find a model that best relates them to outputs (Y variables)

Training algorithm - Set of inputs - find models that relates to outputs.

74
Q

What is unsupervised machine learning

A

Same as supervised learning, but does not make use of labeled training data.

We give it data and expect the algorithm to make sense of it.

75
Q

What is Overfitting

A

ML models can produce overly complex models that may fit the training data too well and thereby not generalize new data well.

The prediction model of the traning sample (in-Sample data) is too complex.

The traning Data does not work well with the new data

76
Q

Name Supervised ML Algorithms

A

Penalized regression
Support Vector Machine (SVM)
K - Nearest Neighbor
Classification and Regression Trees (CART)
Ensemble learning
Random Forest

77
Q

Name unsupervised ML Algortihms

A

Principle component analysis
K-Mean clustering
Hierarachical clustering

78
Q

High Bias Error in ML

A

High Bias Error means the model does not fit the training data well.

79
Q

High Variance Error in ML

A

High variance error means the model does not predict well on the test data

80
Q

Name Dimension Reduction in ML

A

Principle component analysis (unsupervised ML)

Penalized Regression
(Supervised ML)

81
Q

What does Penalized Regression do?

A
  • Simmilar to maximizing adjusted R square.
  • Demension Reduction
  • Eliminates/minimazie overfitting

Regression coefficients are chosen to minimize the sum of the squared error, plus a penalty term that increases with the number of included features

82
Q

What is SVM

A

Support Vector Machine

It is Classification, Regression, and Outlier detection
Classifying data that is not complex or non-linear.

Is a linear classifier that determines the hyperplane that optimally seperates the observation into two sets of data points.

Does not requier any hyperparameter.

Maximize the probability of making a correct prediction by determining the boundry that is furthest from all observation.

Outliers do not affect either the support vectors or the discriminant boundry.

83
Q

What is K-Nearest Neighbor

A

Classification

Classify new observation by finding similarities in the existing data.

Makes no assumption about the distribution of the data.
It is non-parametric.

KNN results can be sensitive to inclusion of irrelevant or correlated featuers, so it may be neccessary to select featuers manually.
Thereby removing less irrelevant information.

84
Q

What is CART

A

Classification and Regression Trees

Part of supervised ML

Typically applied when the target is binary.

If the goal is regression, the prediction would be the mean of the values of the terminal node.

Makes no assumption about the characteristics of the traning data, so if left unconstrained, potentially it can perfectly learn the traning data.

To avoid overfitting, regulation paramterers can be added, such as the maximum dept of the tree.

85
Q

What are the 3 types of layer in Neural Network

A
  1. Input layer
  2. Hidden layer
  3. Output layer
86
Q

What are non-linear functions more susceptiable to?

A

Variance error and overfitting

87
Q

What are linear functions more susceptiable to?

A

Bias error and underfitting

88
Q

The main distinction between clustering and classification algorithms is that

A

The groups in clustering are determined by the data

Classification they are determined by the analyst/researcher

89
Q

What is K-Means clustering in ML?

A

K-means partitions observations into a fixed number, k, of non-overlaping cluster.

Each cluster is characterized by its centroid, and each observation is assigned by the algorithm to the cluster with the centroid to which that observation is closest.

90
Q

High bias error and high variance error are indicative of…

A

Underfitting

High bias error = model does not fit on the traning data.

High variance = Model does not predict well on test data.

Both combination results in a underfitted model.

90
Q

Low bias error but high variance error is indicative of ..

A

Overfitting

Bias error = model does not fit the traning data well.

Variance error = Model does not predict well on test data.

91
Q

What are linear models more susceptible to?

A

Bias Error (underfitting)

92
Q

What are non-linear models more prone to?

A

Variance Error
(overfitting)

93
Q

What is Principal Components Analysis

A

It is part of unsupervised ML
Dimension Reduction

Use to reduce highly correlaed featuers of data into few main uncorrelated composite variables.

94
Q

Steps in Big Data Analysis/Projects: Traditional with strucutred data.

A

Conceptualize the task -> Collect data -> Data Preperation & processing -> Data Exploration -> Model traning.

95
Q

Steps in Big Data Analysis/Projects: Textual Bid Data.

A

Text probelm formulation -> Data Curation -> Text preperation and processing -> Text exploration -> Classifier output.

96
Q

Preperation in strucutred data: Extraction

A

Creating a new variable from an already existing one for easing the analysis.

Example: Date of birth -> Age

97
Q

Preperation in strucutred data: Aggregation

A

2 or more variables aggregated into one signle variable.

98
Q

Preperation in strucutred data: Filtration

A

Eliminate data rows which are not needed.

[We filter out the information that is not relevant]

CFA Lv 2 Candidates only

99
Q

Preperation in strucutred data: Selection

A

Columns that can be eliminated

100
Q

Preperation in strucutred data: Conversion

A

Nominal, ordinal, integer, ratio, categorical.

101
Q

Cleansing strucutred data: Incomplete

A

Missing entries

102
Q

Cleansing strucutred data: Invalid

A

Outside a meaningful range

103
Q

Cleansing strucutred data: Inconsistent

A

Some data conflicts with other data.

103
Q

Cleansing strucutred data: Inaccurate

A

Not a true value

104
Q

Cleansing strucutred data: non-uniform

A

Non identical data format

American date (M/D/Y) vs European (D/M/Y)

105
Q

Cleansing strucutred data: Duplication

A

Multiple identical observation

106
Q

Adjusting the range of a feature: Normalization

A

Rescales in the rage 0-1

Sensitive to outliers.

Xi- Xmin /(Range)
Xi- Xmin /(Xmax -Xmin)

107
Q

Adjusting the range of a feature: Standardization

A

Centers and Rescales

Requiers normal distribution

(Xi - u) / Standard deviation

108
Q

Performance evaluation graph: Precision formula

A

P= TP / (TP + FP)

Remeber: Demoninator ( Positive)
Useful when type 1 error is high

is the ratio of correctly predictive positive classes to all predictive positive classes.
Precision is useful in situations where the cost of FP or Type I Error is high.

For example, when an expensive product fails quality inspection (predicted class 1) and is
scrapped, but it is actually perfectly good (actual class 0).

109
Q

Performance evaluation graph: Recall formula

A

TP / / (TP + FN)
Remember: ( Recall we have the opposite in the denominator)

Sensitivity: useful when type 2 error is high.

also known as sensitivity i.e. is the ratio of correctly predicted positive classes to all actual
positive classes. Recall is useful in situations where the cost of FN or Type II Error is high.

For example, when an expensive product passes quality inspection (predicted class 0) and
is sent to the valued customer, but it is actually quite defective (actual class 1)

110
Q

Performance evaluation graph: Accuracy formula

A

(TP + TN) / (TP + FN + TN + FP)

Is the percentage of correctly predicted classes out of total predictions.

111
Q

Receiver operating characterisitcs: False Positive Rate Formula

A

FP / (FP + TN)

Statement / (Statement + Opposite)

112
Q

Receiver operating characterisitcs: True Positive Rate Formula

A

TP / (TP + FN)

Statement / (Statement + Opposite)

113
Q

In big data projects, which measure is the most appropriate for regression method

A

RMSE

(Root Mean Square Error)

114
Q

What is “trimming” in big data projects?

A

Removing the bottom and top 1% of observation on a feature in a data set.

115
Q

What is “Winsorization” in big data projects?

A

Replacing the extreme values in a data set with the same maximum or minumimum value

116
Q

Confusion Matrix: F1 Score Formula

A

(2 x P x R) / (P + R)

is the harmonic mean of precision and recall.

F1 Score is more appropriate than Accuracy when unequal class distribution is in the dataset andit is necessary to measure the equilibrium of Precision and Recall.

High scores on both of these metrices suggest good model performance.

117
Q

Confusion Matrix Display

A

TP FP
FN TN

118
Q

What is Mutual Information in big data projects?

A

How much info a token contributes to a class

Mutual Information (MI) Measures how much information is contributed by a token to a class of text.

MI = 0 The token’s distribution in all text classes is the same.

MI = 1 The token in any one class tends to occure more often in only that particular class of text.

119
Q

Feature Engineering

A

Final stage in Data Exploration

Numbers: Differentitate among types of numbers

N-Grams: Multi-Word patterns kept intact

Name entity recognition (NER): Class: Money, Time, Organization.

120
Q

How to deal with Class Imbalance?

A

The majority class can be under-sampled and the minority class can be over-sampled.

121
Q

Tokenization is the process of

A

Splitting a givien text into seperate words or characters.

Token is equvulant to a word, and tokenization is the process of splitting the word into seperate tokens.

122
Q

the sequence of steps for text preprocessing is to produce

A

Tokens -> N-grams which to build a bag -> Input to a document term matrix.

123
Q

Big Data differs from traditional data sources based on the presence of a set of characteristics commonly referred to as the 4 V’s.. **What are thr 4 V’s? **

A

Volume: refers to the quantity of data.

Variety: pertains to the array of available data sources.

Velocity: is the speed at which data is created (data in motion is hard to analyze compared to data at rest).

Veracity: related to the credibility and reliability of different data sources.

124
Q

What is Exploratory Data Analysis (EDA), and in which stage is it in?

A

Stage 4 and first stage in Data Exploration

is the preliminary step in data exploration. Exploratory graphs, charts and other visualizations
such as heat maps and word clouds are designed to summarize and observe data.

125
Q

What is Feature Selection, and in which stage is it in?

A

Stage 4 and Second stage in Data Exploration

is a process whereby only pertinent features from the dataset are selected for ML model training.
Feature

126
Q

What is Feature Engineering, and in which stage is it in?

A

Stage 4 and Third and final stage in Data Exploration

is a process of creating new features by changing or transforming existing features. Feature Engineering techniques systematically alter, decompose or combine existing features to produce more meaningful
features.

127
Q

Formula For F-Test

A

MSR / MSE

[ RSS / K ] / [SSE / n-(k+1) ]

[ Regression / K ] / [ Residual / n-(k+1) ]

128
Q

Hypothesis test for F test

A

F test > F stat : Reject null. b1 = b2 = bn = 0

F test < F stat : Fail to reject null. b1 =/ b2 =/ bn =/ 0

129
Q

What are the 3 types of error in ML?

A

Bias error
Variance error
Base error

130
Q

What is variance error in ML?

A

Variance Error or how much the model’s results change in response to new data from
validation and test samples.

Unstable models pick up noise and produce high variance
causing overfitting and ↑ out of-sample error.

131
Q

What is Bias error in ML?

A

Bias Error or the degree to which a model fits the training data.
Algorithms with erroneous assumptions produce high bias with poor approximation, causing underfitting and ↑ in-sample error.

(Adding more training samples will not improve the model)

132
Q

What is Bias error in ML?

A

Base Error due to randomness in the data.

(Out-of-sample accuracy increases as the training sample size increases)

133
Q

Name 2 ways to Preventing Overfitting in Supervised Machine Learning

A

Ocean’s Razor: The problem solving principle that the simplest solution tends to be the correct one.

In supervised ML, it means preventing the algorithm from getting too complex during selection and training by limiting the no. of features and penalizing algorithms that are too complex or too flexible by constraining them to include only parameters that reduce out-of-sample error.

K-Fold Cross Validation: This strategy comes from the principle of avoiding sampling bias.
The challenge is having a large enough data set to make both training and testing possible on representative samples.