Week 1 - Foundations of Machine Learning Flashcards
Supervised learning
Dataset is already labeled with correct answers (the y-values or categories), allowing learning algorithm to train on labeled data to make prediction on new/unseen data
Classification
A type of supervised learning where goal is to assign input vectors into one of a predefined set of discrete classes
Unsupervised learning (clustering)
We are given data points without labels or categories. Using this data, the algorithm aims to identify groups by clustering similar data points together.
Regression
Another supervised learning model but is not discrete (classification) but continuous.
Example: Money
Predictors
Predictors are our x values
- n will be number of observations/samples
- p refers to number of variables
Observations
Using i^th observation to denote a new row
Variables
Using j^th variable to denote a new column
Response/Target
The response or target is ‘y’
The rows of the y vector column responds to each observation of the data
Projection
A projection turns our data from ‘p’ variables to ‘d’ variables (lower dimensional space)
Data in a 3 dimensional space, to make it more simple we can project it and transform to a 2 dimensional space
Predictive accuracy
The goal is to make accurate predictions of ‘y’ on new data
Interpretability
Aim to understand the relationship between ‘x’ and ‘y’
Simpler models are preferred if they achieve similar accuracy
Training and test splits
- Training with 80% used to fit model
- Testing with 20% used to assess final models performance of future data
- If you use training set to test, it is biased since the training set is designed to predict the model well
Measuring accuracy for categorical response
For classification problems, accuracy is measured using the ERROR RATE, which is the fraction of misclassifications
The Training Error Rate is calculated using training data, but the Test Error Rate provides a better estimate of future accuracy by evaluating the model on test data
Predicting Probabilities
Instead of getting a prediction, we instead will get probabilities of different y’s for each new observation
- Probabilities contain much more information than just a class labels
Decision rule
Probabilities contain much more information than just a class labels
The decision rule ‘a’ is problem specific and depends on consequences of misclassification
Receiver Operator Curves (ROC)
ROC curve helps evaluate model performance across different threshold values of the decision rule ‘a’
A good classifier increases True Positive Rate (TPR) faster than False Positive Rate (FPR) as ‘a’ changes
Parametric Method
A machine learning approach that assumes a fixed functional form for the model and estimates a set number of parameters
(e.g., linear regression, logistic regression)
Non-Parametric Method
A machine learning approach that makes no strict assumptions about the data’s structure, allowing the model to be more flexible and data-driven
(e.g., k-Nearest Neighbors, decision trees)
Reducible Error
Error that comes from an incorrect model form or poor parameter estimation
It can be minimized by improving the model
Irreducible Error
Error that remains even with the best possible model, caused by inherent randomness or noise in the data
Flexibility with Parametric and Non-Parametric models
Parametric models are generally less flexible, while non-parametric models can be adjusted for flexibility based on their parameters
Bias
Error when modelling a complicated problem using a simpler model
High bias - model too simple, leading to underfitting
More flexible method will have less bias
Variance
Error from a model being too sensitive to training data, leading to poor generalisation (overfitting)
Model is too complex, leading to overfitting
More flexible method will have more variance
Bias-Variance trade-off
the balance between:
- Bias (underfitting): Model is too simple and misses patterns.
- Variance (overfitting): Model is too complex and learns noise
Middle ground, minimising both errors for better predictions