Week 1 - Foundations of Machine Learning Flashcards by Richard Hua

Supervised learning

Dataset is already labeled with correct answers (the y-values or categories), allowing learning algorithm to train on labeled data to make prediction on new/unseen data

How well did you know this?

Not at all

Perfectly

Classification

A type of supervised learning where goal is to assign input vectors into one of a predefined set of discrete classes

How well did you know this?

Not at all

Perfectly

Unsupervised learning (clustering)

We are given data points without labels or categories. Using this data, the algorithm aims to identify groups by clustering similar data points together.

How well did you know this?

Not at all

Perfectly

Regression

Another supervised learning model but is not discrete (classification) but continuous.

Example: Money

How well did you know this?

Not at all

Perfectly

Predictors

Predictors are our x values
- n will be number of observations/samples
- p refers to number of variables

How well did you know this?

Not at all

Perfectly

Observations

Using i^th observation to denote a new row

How well did you know this?

Not at all

Perfectly

Variables

Using j^th variable to denote a new column

How well did you know this?

Not at all

Perfectly

Response/Target

The response or target is ‘y’

The rows of the y vector column responds to each observation of the data

How well did you know this?

Not at all

Perfectly

Projection

A projection turns our data from ‘p’ variables to ‘d’ variables (lower dimensional space)

Data in a 3 dimensional space, to make it more simple we can project it and transform to a 2 dimensional space

How well did you know this?

Not at all

Perfectly

Predictive accuracy

The goal is to make accurate predictions of ‘y’ on new data

How well did you know this?

Not at all

Perfectly

Interpretability

Aim to understand the relationship between ‘x’ and ‘y’

Simpler models are preferred if they achieve similar accuracy

How well did you know this?

Not at all

Perfectly

Training and test splits

Training with 80% used to fit model
Testing with 20% used to assess final models performance of future data
If you use training set to test, it is biased since the training set is designed to predict the model well

How well did you know this?

Not at all

Perfectly

Measuring accuracy for categorical response

For classification problems, accuracy is measured using the ERROR RATE, which is the fraction of misclassifications

The Training Error Rate is calculated using training data, but the Test Error Rate provides a better estimate of future accuracy by evaluating the model on test data

How well did you know this?

Not at all

Perfectly

Predicting Probabilities

Instead of getting a prediction, we instead will get probabilities of different y’s for each new observation
- Probabilities contain much more information than just a class labels

How well did you know this?

Not at all

Perfectly

Decision rule

Probabilities contain much more information than just a class labels

The decision rule ‘a’ is problem specific and depends on consequences of misclassification

How well did you know this?

Not at all

Perfectly

Receiver Operator Curves (ROC)

Study These Flashcards

ROC curve helps evaluate model performance across different threshold values of the decision rule ‘a’

A good classifier increases True Positive Rate (TPR) faster than False Positive Rate (FPR) as ‘a’ changes

Parametric Method

Study These Flashcards

A machine learning approach that assumes a fixed functional form for the model and estimates a set number of parameters

(e.g., linear regression, logistic regression)

Non-Parametric Method

Study These Flashcards

A machine learning approach that makes no strict assumptions about the data’s structure, allowing the model to be more flexible and data-driven

(e.g., k-Nearest Neighbors, decision trees)

Reducible Error

Study These Flashcards

Error that comes from an incorrect model form or poor parameter estimation

It can be minimized by improving the model

Irreducible Error

Study These Flashcards

Error that remains even with the best possible model, caused by inherent randomness or noise in the data

Flexibility with Parametric and Non-Parametric models

Study These Flashcards

Parametric models are generally less flexible, while non-parametric models can be adjusted for flexibility based on their parameters

Bias

Study These Flashcards

Error when modelling a complicated problem using a simpler model

High bias - model too simple, leading to underfitting

More flexible method will have less bias

Variance

Study These Flashcards

Error from a model being too sensitive to training data, leading to poor generalisation (overfitting)

Model is too complex, leading to overfitting

More flexible method will have more variance

Bias-Variance trade-off

Study These Flashcards

the balance between:
- Bias (underfitting): Model is too simple and misses patterns.
- Variance (overfitting): Model is too complex and learns noise

Middle ground, minimising both errors for better predictions

Diagnosing Model Fit

1. Evaluate Performance Metrics: Check accuracy, sensitivity, specificity, and error rates. 2. Analyse Errors: Examine misclassifications and important variables. 3. Visualize Residuals: Identify patterns or issues in model predictions. 4. Check Data Consistency: Ensure the test set is similar to the training set. 5. Holistic Review: Consider both the data and model together for better insights.

Transformations

Transformations can be applied to data to get a better fit (reduce bias)

The Big Picture: Steps in Model Building

1 Understand Your Data: Identify response type, predictor types, independence, missing values, and anomalies. 2. Visualize the Data: Check distributions, gaps, and patterns. 3. Fit Models: Test different models, compute fit statistics, and analyse parameters. 4. Evaluate and Improve: Compare models, simplify if possible, and check for systematic errors, bias, and variance.

Orthonormal

For a projection to occur, columns need to be orthonormal - Transpose of the matrix multiplied by the matrix should be identical in terms of the 'd' dimensions Tutorial question asked us to get the projection matrix - The original was a 4x5 matrix - P in this case is 5 - Changing from a 5 dimensional space to a 2 dimensional space, we will have a projection matrix that is 5x2

Stratify splitting

Always stratify splitting by sub-groups This ensures training set resembles testing set, so we don't miss observations (if 1 group has low observations, it may not exist in the training set)

Week 1 - Foundations of Machine Learning Flashcards

(29 cards)