Quantitative Flashcards
The Task of reducing the number of independent variables in a data set typically involves the use of what?
Unsupervised learning with untagged data
in support vector machine outliers will affect…
don’t affect either the support vectors or the discriminant boundary
regarding neural networks - True or False?
Each hidden node has both a summation and an activation function?
True
True or false?
Penalized regression attempts to minimize the sum of the squared residual less a penalty term that increase with the number of features used
False
Penalized regression attempts to minimize the sum of the squared residual plus a penalty term that increase with the number of features used
In supervised learning techniques, the test dataset contains - featured inputs, targeted outcomes, or both?
and in the training dataset contains - featured inputs, targeted outcomes, or both?
Test: featured input only
Training: both
the supervised machine learning that requires the modeler to specify the most number of hyperparameters is KNN? LASSO? or random forest?
Random Forests
both LASSO and KNN require 1 Hyperparameter each
a random forest requires 4: the number of subset features (m), the number of trees to use, the minimum size (population) of each node or left, and the maximum depth of each tree are all hyperparameters that can be turned to improve overall model predication accuracy
an ML algo used to minimize the chances of overfitting a model to a dataset when there are many variables that could be used to explain or model Y, is known as …
Penalized regression
When a target variable is continuous and the relationship between the target and the features is non-linear, the modeler should use;
a) penalized regression
b) classification and regression trees
c) LASSO
B) Classification and regression trees
A) and C) are linear regression techniques
Model generalization is max when predication error on test data is minimized
a model that generalizes well is a model that retains its explanatory power when predicting out of sample. the evaluation of any ML algo thus focuses on its prediction error on new data rather than on its goodness-of-fit on the data in which the algo was fitted
CART
A Classification And Regression Tree (CART), is a predictive model, which explains how an outcome variable’s values can be predicted based on other values. A CART output is a decision tree where each fork is a split in a predictor variable and each end node contains a prediction for the outcome variable
The predication would be the mean of the value at a terminal node
used when the target variable of prediction is either categorical or continuous
Which supervised learning technique does not require the modeler to specify a hyperparameter
a) KNN
b) SVM
c) LASSO
b) SVM
LASSO - he hyperparameter in LASSO is lambda, the weight multiplied by the sum of the absolute value of the coefficient
KNN - k is hyperparameter
Combining the prediction from a collection of learning algo is called?
a) deep learning
b) neural networks
c) ensemble learning
c)
unsupervised learning is typically not well suited for use in tasks involving:
a) clustering
b) classification
c) dimension reduction
b) classification
Two important types of problems that are well suited to unsupervised machine learning are reducing the dimension of data and sorting data into clusters, known as dimension reduction and clustering, respectively
Feature selection reduces data set dimensionality by
including and excluding features in the data without altering them
Data Triming
when extreme values and outliers are simply removed from the database