Quantitative Flashcards

Question 1

Q

The Task of reducing the number of independent variables in a data set typically involves the use of what?

Answer

A

Unsupervised learning with untagged data

Question 2

Q

in support vector machine outliers will affect…

Answer

A

don’t affect either the support vectors or the discriminant boundary

Question 3

Q

regarding neural networks - True or False?

Each hidden node has both a summation and an activation function?

Question 4

Q

True or false?

Penalized regression attempts to minimize the sum of the squared residual less a penalty term that increase with the number of features used

Answer

A

False

Penalized regression attempts to minimize the sum of the squared residual plus a penalty term that increase with the number of features used

Question 5

Q

In supervised learning techniques, the test dataset contains - featured inputs, targeted outcomes, or both?

and in the training dataset contains - featured inputs, targeted outcomes, or both?

Answer

A

Test: featured input only

Training: both

Question 6

Q

the supervised machine learning that requires the modeler to specify the most number of hyperparameters is KNN? LASSO? or random forest?

Answer

A

Random Forests

both LASSO and KNN require 1 Hyperparameter each

a random forest requires 4: the number of subset features (m), the number of trees to use, the minimum size (population) of each node or left, and the maximum depth of each tree are all hyperparameters that can be turned to improve overall model predication accuracy

Question 7

Q

an ML algo used to minimize the chances of overfitting a model to a dataset when there are many variables that could be used to explain or model Y, is known as …

Answer

A

Penalized regression

Question 8

Q

When a target variable is continuous and the relationship between the target and the features is non-linear, the modeler should use;

a) penalized regression
b) classification and regression trees
c) LASSO

Answer

A

B) Classification and regression trees

A) and C) are linear regression techniques

Question 9

Q

Model generalization is max when predication error on test data is minimized

Answer

A

a model that generalizes well is a model that retains its explanatory power when predicting out of sample. the evaluation of any ML algo thus focuses on its prediction error on new data rather than on its goodness-of-fit on the data in which the algo was fitted

Question 10

Q

CART

Answer

A

A Classification And Regression Tree (CART), is a predictive model, which explains how an outcome variable’s values can be predicted based on other values. A CART output is a decision tree where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable

The predication would be the mean of the value at a terminal node

used when the target variable of prediction is either categorical or continuous

Question 11

Q

Which supervised learning technique does not require the modeler to specify a hyperparameter

a) KNN
b) SVM
c) LASSO

Answer

A

b) SVM

LASSO - he hyperparameter in LASSO is lambda, the weight multiplied by the sum of the absolute value of the coefficient

KNN - k is hyperparameter

Question 12

Q

Combining the prediction from a collection of learning algo is called?

a) deep learning
b) neural networks
c) ensemble learning

Question 13

Q

unsupervised learning is typically not well suited for use in tasks involving:

a) clustering
b) classification
c) dimension reduction

Answer

A

b) classification

Two important types of problems that are well suited to unsupervised machine learning are reducing the dimension of data and sorting data into clusters, known as dimension reduction and clustering, respectively

Question 14

Q

Feature selection reduces data set dimensionality by

Answer

A

including and excluding features in the data without altering them

Question 15

Q

Data Triming

Answer

A

when extreme values and outliers are simply removed from the database

Question 16

Q

Data Winsorization

Answer

Study These Flashcards

A

when extreme values and outliers are replaced with the max and min values of data points that are not outliers

Question 17

Q

Data Filtration

Answer

Study These Flashcards

A

it is a preprocessing activity that remove unneeded observations

Question 18

Q

Tokenization

Answer

Study These Flashcards

A

it is splitting a given text into separate words or characters

Question 19

Q

Large data set with a large number of features will most likely contribution to: good model fit, model overfitting, or model underfitting

Answer

Study These Flashcards

A

model overfitting

Question 20

Q

advantage of simulation modelling

Answer

Study These Flashcards

A

it yields a distribution for expected value rather than a point estimate
also estimates a standard deviation and a breakdown of the values, by percentile
the expected values from simulations should be fairly close to the expected value that we would obtain using the conventional risk-adjusted model

Question 21

Q

Stemming and lemmatization address the problem of - what?

Answer

Study These Flashcards

A

Data sparseness or low frequency tokens.

Data sparseness refers to words that appear very infrequently, resulting in data consisting of many unique, ow- frequency token.

Stemming and lemmatization can decrease data sparseness by aggregating many sparsely occurring words in relatively less sparse steam or lemmas, thereby aiding in training less complex ML models.

Question 22

Q

Data Exploration stages are

Answer

Study These Flashcards

A

exploratory data analysis
feature selection (use Ch-square tests and mutual information scores)
feature engineering

Quantitative Flashcards

(22 cards)