Machine Learning Basics Flashcards

Question

Describe Support Vector Machine (SVM) algorithm

Answer 1

Supervised Classification method to derive optimal boundaries separating groups. Identify peripheral data points (support vectors) located closest to points from other group, then optimal boundary drawn down middle. Fast method - less time required for computation as other support vectors used to derive boundary. However, sensitive to position of support vectors

Answer 2

Parameters are options used to tweak an algorithm's settings. Model's accuracy suffers when not sufficiently tuned. Overfitting - mistake random variations as persistent pattern Underfitting - overlooks underlying patterns

Answer 3

Supervised method that can be used for both Classification and Regression. => sometime referred to as CART (Class and Regress Tree). A decision tree would predict survival chance through a series of binary questions (Yes/No type questions). Start at top question (root node), and move through tree branches as guided by response until you reach leaf node that indicates survival chance.

Answer 4

To ensure that any model generated can do the job it has been built for. 1. Determine a model built for a particular task is most suited to that task 2. Estimate how the model will perform when deployed 3. Convince the business for whom model is being developed that it will meet their needs

Answer 5

Yes. For a model to be successfully deployed must consider issues such as: how quickly model makes predictions how easy it is for human analysts to understand predictions how easy is it to retrain a model should it go stale

Answer 6

It depends on the context. No model will ever be 100% perfect, but there are a range of ways in which models can be incorrect. For medical diagnosis we require very accurate results, and don;t predict a sick patient as healthy. A model predicting which customers would response to an online ad may not be held to the same strict criteria.

Answer 7

Don't use the same data sample to both train a predictive model and then to evaluate/test it

Answer 8

A hold out set is used to evaluate performance of a model, ensuring it was not used in training the model. Data is randomly allocated to a train and test set, and the performance measured on the test set should be a good indicator of how model will perform on future unseen data. Generally 70/30 split.

Answer 9

K-Fold validation is a re-sampling procedure used for limited data samples. Available data is split into K equal sized folds, and K separate evaluation experiments are performed. 1st time, data in the 1st fold is test set and remaining folds are for training. 2nd time, 2nd fold is test and so on. The performance metrics from each run are aggregated to give a final score. K=10 has been found to be a good value.

Answer 10

A table used to understand performance of a classification model (classifier) on a set of test data for which true values are known. Used as the basis for calculating performance measures. It calculates frequency of each possible model prediction outcome from a test set. For prediction problem with binary target feature there are four possible outcomes: TP, TN, FP, FN

Answer 11

Instance in test set that had a positive target feature value, and predicted to have a positive target feature value.

Answer 12

Instance in test set that had a negative target feature value, and predicted to have a negative target feature value.

Answer 13

Instance in test set that had a negative target feature value, and predicted to have a positive target feature value.

Answer 14

Instance in test set that had a positive target feature value, and predicted to have a negative target feature value.

Answer 15

Predicted target value pos neg Actual Target pos TP FN neg FP TN ``` FP = Type 1 error (False Predicted Positive) FN = Type 2 error (False Predicted Negative) ```

Answer 16

Accuracy: overall, how often is the classifier correct Misclassification: overall, how often is it wrong - this is also called the Error Rate. Accuracy calculated from (TP/TN)/ Total Misclass = 1 - Accuracy

Answer 17

The accuracy paradox. Sometimes accuracy is not a good measure for classifiers in predictive analytics. If there are far more instances of one category over another, say 99%, then predicting every element is in that category will have an accuracy of 99%. This is known as class imbalance. If used for fraud prediction and only 1% of transactions are fraud, then accuracy of 99% correctly predicting non-fraud is useless.

Answer 18

% calculated from confusion matrix, and tells us how confident that an instance predicted to have positive target level actually has a positive target level. - when it predicts positive how often is it correct. Higher % indicates better performance. TP / (TP + FP)

Answer 19

% calculated from confusion matrix, and tells us how confident that all instances with a positive target level have been found by the model - when it's actually positive, how often does it predict positive. Also called TPR or sensitivity Higher % indicates better performance. TP / (TP + FN)

Answer 20

Useful alternative to simple accuracy rate. Precision and Recall collapsed into single performance measure - harmonic mean of both measures.

Answer 21

Another name for a classification model used in Supervised Learning

Answer 22

Not recommended as it can be confusing. Multi-class (> 2) problems can have confusion matrices and performance metrics, just better not to use TP/TN labels.

Answer 23

Tells us how confident that all instances with a negative target level have been found by the model - when it's actually negative, how often does it predict negative. TN / (TN + FP)

Answer 24

Receiver Operating Characteristic Curve - ROC Curve. Commonly used way to visualise performance of a binary classifier. The curve is plotted using the prediction measurements of TPR (y axis) and FPR (x axis). Plotting performing for each value of the threshold used to separate classes.

Answer 25

A curve that hugs the upper left hand corner. Imagine a diagonal line bisecting the plot. A curve close to this line is poor and no better than random guessing.

Answer 26

Area under the curve (AUC) is a metric (single value) that can measures the area under the ROC curve and can be used to assess models. High value is good.

Answer 27

Different classification models produce a prediction score for their target predictions, and a threshold is used to convert into the predicted classification target value. This threshold can be altered to impact the results of the prediction

Machine Learning Basics Flashcards

Taken from various sources inc: https://d2wvfoqc9gyqzf.cloudfront.net/content/uploads/2018/09/Ng-MLY01-13.pdf (51 cards)