Fundamentals of ML Flashcards
What is ML?
- teach a computer model to make predictions and draw conclusions from data
- building computer systems that learn from data
- ML algorithms are trained to find relationships and patterns in data
- intersection of Data Science and Software Engineering
- Data Scientist: explore and prepare data, train ML model
- Software Engineer: integrate models in applications
ML as a function
- A ML model is a software application that encapsulates a function to calculate an output value based on one or more input values
- Training = defining functions
- Inferencing = predict new values
Steps of Training and Inference
- Data = past observations
- x = observed attributes / features
- y = known value of prediction / label
- x can be a vector of multiple features - Algorithm is applied to determine relationship between x and y
- Result of algorithm is a model that encapsulates a calculation on x to calculate y
- calculation is a function y = f(x) - Trained model can be used for inference
- predictions are ŷ (y-hat)
- rained models are used to draw conclusions from new data
Types of ML
- Supervised ML
a) Regression
b) Classification
ba) binary classification
bb) multiclass classification - Unsupervised ML
a) Clustering
Supervised ML
Training data with known features and values (= labeled dataset)
- most common type
- label can be anything from a category label to a real-valued number
- model learns a mapping between the input (features) and the output (label) during the training process
- once trained, model can predict the output for new, unseen data
Common Examples for supervised ML
- linear regression for regression problems
- logistic regression for binary classification
- decision trees
- support vector machines for classification problems
Unsupervised ML
Only features no known labels (= unlabeled dataset)
- Model finds patterns and relationships between features
Common Examples of unsupervised ML
- Clustering (grouping similar data points together)
- Dimensionality reduction (reducing the number of random variables under consideration by obtaining a set of principal variables)
- k-means for clustering problems
- Principal Component Analysis (PCA) for dimensionality reduction problems
Regression
Models are trained to predict numeric label values based on training data that includes both features and known labels
e.g. predicting ice-cream sales (y) based on temperature (x)
Regression elements of training process
- Split training data randomly (train and validate subset)
- Use algorithm to fit data to a model
- Validating by predicting values
- compare actual labels to predictions
/ aggregate differences to calculate metric of accuracy
Regression Evaluation Metrics
- MAE (Mean Absolute Error)
- MSE (Mean Squared Error)
- RMSE (Root Mean Squared Error)
- R2 (Coefficient of Determination)
MAE
Mean absolute error
- Variance (by how many was each prediction wrong)
- doesn’t matter if + or -
MSE
Mean squared error
- amplifies larger errors
- no longer represents quantity
- better to have a model that’s consistently slightly wrong than fewer but larger errors
RMSE
Root mean squared error
- to represent quantity with squared error
R2
Coefficient of determination
- proportion of variance in validation results
- natural random variance opposed to anomalous aspect
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
ȳ = mean of actual value labels
- result between 0 and 1
- the closer to 1 the better the model is fitting the validation data
Binary Classification
Calculates probability values for class assignments
observed item is or isn’t an instance of a specific class
e.g. predicting Diabetes yes or no
Steps of Binary Classification
- Algorithm calculates probability values for class assignments
- Evaluation metrics compare predicted to actual classes
- Probability is measured as a value between 0.0 and 1.0
- Function describes probability of y being true (=1) for a given value x (f(x) = P (y=1|x)
Classification evaluation metrics
- Confusion matrix
- Accuracy
- Recall (TPR true positive rate)
- Precision
- F1-Score
- AUC (area under the curve)
Confusion Matrix
Matrix of number of correct and incorrect predictions for each possible class label.
columns = ŷ ( 0 and 1)
rows = y (0 and 1)
TN (true negative): ŷ=0 and y=0
FP (false positive): ŷ=1 and y=0
FN (false negative): ŷ=0 and y=1
TP (true positive): ŷ=1 and y=1
The arrangement of the confusion matrix is such that correct (true) predictions are shown in a diagonal line from top-left to bottom-right
Accuracy
Proportion of right predictions
(TN+TP) ÷ (TN+FN+FP+TP)
Recall
TPR true positive rate
Measures proportion of positive cases identified correctly
TP ÷ (TP+FN)
e.g. compared to patients with Diabetes, how many were identified correctly
Precision
Proportion of predicted positive cases where label is actually positive
TP ÷ (TP+FP)
e.g. what proportion of predicted positive cases actually have diabetes
F1- Score
Combines recall and precision
(2 x Precision x Recall) ÷ (Precision + Recall)
AUC (Area under the curve)
Plotting an ROC (received operator curve)
Shows all TPRs and FPRs (true/ false positive rate) for all thresholds (decision point on yes or no)
Where straight line goes into curve = AUC
e.g. if AUC is 0.875 –> works better than random guessing (over 0.5)
Multiclass Classification
Label represents one of multiple possible classes
Mostly mutually exclusive (e.g. penguin species) but multilabel classifications are possible (e.g. movie genres)
Training-algorithms for a multiclass classification model
-OVR (One vs Rest)
-Multinominal algorithms
OVR
One vs Rest
Trains binary classification for each class
Calculating the probability that the observation is an example of the target class
Multinominal algorithms
Single function that returns a multi valued output
Output = Vector with probability distribution for all classes (all add up to 1)
Clustering
Identify similarities between observations
No known cluster labels
Can be used to create classes for multiclass Classification
Training Clustering Model
K-Means (most common)
1. Feature values vectorized to define n-dimensional coordinates
2. decide on k (= how many clusters), plot k at random coordinates (=centroids)
3. each data point is assigned to its nearest centroid
4. centroids moved to center of datapoints, based on mean distance between points
5. reassign data points to closest centroid
6. repeat 4. and 5. until stable or predetermined maximum iterations
Evaluating a Clustering Model
How well are clusters separated?
- average distance to centroid
- average distance to other centroids
- maximum distance to centroid
- silhouette (ratio of distance between points in same cluster and points in different clusters, value between -1 and 1. The closer to 1 the better the cluster separation)
Iterative Training
Vary:
- feature selection and preparation
- algorithm selection
- algorithm parameters
Deep Learning
- tries to emulate how the brain learns
- creation of artificial neural network that simulates electrochemical activity in biological neurons by using mathematical functions
- each neuron = function, operates with input x and weight w
- multiple layers of neurons (DNN deep neural network, deeply nested functions)
- change w in iterations to reduce loss (ŷ compared to y)
- Data is batched into Matrices and processed using linear algebraic calculations -> good GPUs (Graphics processing unit) needed
How does a neural network learn?
- Training features are fed into the input layer (vector x)
- Randomly assigned w , function combines x and w
- If threshold is reached, activation function passes it on
- each neuron is connected with each neuron in next layer (fully connected network)
- Output layer produces a vector containing the calculated values for ŷ
- Loss function to compare the predicted ŷ values to the known y values and aggregate the difference
- Optimization function: use differential calculus to evaluate the influence of each weight on the loss, gradient descent to improve (each weight is in- or decreased)
- New weights are backpropagated to the layers in the network, replacing the previously used values.
- Multiple iterations (known as epochs) until the loss is minimized and the model predicts acceptably accurately
Azure Machine Learning
Cloud service for training, deploying, and managing machine learning models
- Exploring data and preparing it for modeling.
- Training and evaluating machine learning models.
- Registering and managing trained models.
- Deploying trained models for use by applications and services.
-Reviewing and applying responsible AI principles and practices.