Week 1 - Foundations of Machine Learning Flashcards

1
Q

Supervised learning

A

Dataset is already labeled with correct answers (the y-values or categories), allowing learning algorithm to train on labeled data to make prediction on new/unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Classification

A

A type of supervised learning where goal is to assign input vectors into one of a predefined set of discrete classes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Unsupervised learning (clustering)

A

We are given data points without labels or categories. Using this data, the algorithm aims to identify groups by clustering similar data points together.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Regression

A

Another supervised learning model but is not discrete (classification) but continuous.

Example: Money

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Predictors

A

Predictors are our x values
- n will be number of observations/samples
- p refers to number of variables

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Observations

A

Using i^th observation to denote a new row

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Variables

A

Using j^th variable to denote a new column

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Response/Target

A

The response or target is ‘y’

The rows of the y vector column responds to each observation of the data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Projection

A

A projection turns our data from ‘p’ variables to ‘d’ variables (lower dimensional space)

Data in a 3 dimensional space, to make it more simple we can project it and transform to a 2 dimensional space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Predictive accuracy

A

The goal is to make accurate predictions of ‘y’ on new data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interpretability

A

Aim to understand the relationship between ‘x’ and ‘y’

Simpler models are preferred if they achieve similar accuracy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Training and test splits

A
  • Training with 80% used to fit model
  • Testing with 20% used to assess final models performance of future data
  • If you use training set to test, it is biased since the training set is designed to predict the model well
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Measuring accuracy for categorical response

A

For classification problems, accuracy is measured using the ERROR RATE, which is the fraction of misclassifications

The Training Error Rate is calculated using training data, but the Test Error Rate provides a better estimate of future accuracy by evaluating the model on test data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Predicting Probabilities

A

Instead of getting a prediction, we instead will get probabilities of different y’s for each new observation
- Probabilities contain much more information than just a class labels

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Decision rule

A

Probabilities contain much more information than just a class labels

The decision rule ‘a’ is problem specific and depends on consequences of misclassification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Receiver Operator Curves (ROC)

A

ROC curve helps evaluate model performance across different threshold values of the decision rule ‘a’

A good classifier increases True Positive Rate (TPR) faster than False Positive Rate (FPR) as ‘a’ changes

17
Q

Parametric Method

A

A machine learning approach that assumes a fixed functional form for the model and estimates a set number of parameters

(e.g., linear regression, logistic regression)

18
Q

Non-Parametric Method

A

A machine learning approach that makes no strict assumptions about the data’s structure, allowing the model to be more flexible and data-driven

(e.g., k-Nearest Neighbors, decision trees)

19
Q

Reducible Error

A

Error that comes from an incorrect model form or poor parameter estimation

It can be minimized by improving the model

20
Q

Irreducible Error

A

Error that remains even with the best possible model, caused by inherent randomness or noise in the data

21
Q

Flexibility with Parametric and Non-Parametric models

A

Parametric models are generally less flexible, while non-parametric models can be adjusted for flexibility based on their parameters

22
Q

Bias

A

Error when modelling a complicated problem using a simpler model

High bias - model too simple, leading to underfitting

More flexible method will have less bias

23
Q

Variance

A

Error from a model being too sensitive to training data, leading to poor generalisation (overfitting)

Model is too complex, leading to overfitting

More flexible method will have more variance

23
Q

Bias-Variance trade-off

A

the balance between:
- Bias (underfitting): Model is too simple and misses patterns.
- Variance (overfitting): Model is too complex and learns noise

Middle ground, minimising both errors for better predictions

24
Diagnosing Model Fit
1. Evaluate Performance Metrics: Check accuracy, sensitivity, specificity, and error rates. 2. Analyse Errors: Examine misclassifications and important variables. 3. Visualize Residuals: Identify patterns or issues in model predictions. 4. Check Data Consistency: Ensure the test set is similar to the training set. 5. Holistic Review: Consider both the data and model together for better insights.
25
Transformations
Transformations can be applied to data to get a better fit (reduce bias)
26
The Big Picture: Steps in Model Building
1 Understand Your Data: Identify response type, predictor types, independence, missing values, and anomalies. 2. Visualize the Data: Check distributions, gaps, and patterns. 3. Fit Models: Test different models, compute fit statistics, and analyse parameters. 4. Evaluate and Improve: Compare models, simplify if possible, and check for systematic errors, bias, and variance.
27
Orthonormal
For a projection to occur, columns need to be orthonormal - Transpose of the matrix multiplied by the matrix should be identical in terms of the 'd' dimensions Tutorial question asked us to get the projection matrix - The original was a 4x5 matrix - P in this case is 5 - Changing from a 5 dimensional space to a 2 dimensional space, we will have a projection matrix that is 5x2
28
Stratify splitting
Always stratify splitting by sub-groups This ensures training set resembles testing set, so we don't miss observations (if 1 group has low observations, it may not exist in the training set)