- teach a computer model to make predictions and draw conclusions from data - building computer systems that learn from data - ML algorithms are trained to find relationships and patterns in data - intersection of Data Science and Software Engineering - Data Scientist: explore and prepare data, train ML model - Software Engineer: integrate models in applications

- A ML model is a software application that encapsulates a function to calculate an output value based on one or more input values - Training = defining functions - Inferencing = predict new values

1. Supervised ML a) Regression b) Classification ba) binary classification bb) multiclass classification 2. Unsupervised ML a) Clustering

Fundamentals of ML Flashcards by L P

What is ML?

teach a computer model to make predictions and draw conclusions from data
building computer systems that learn from data
ML algorithms are trained to find relationships and patterns in data
intersection of Data Science and Software Engineering
Data Scientist: explore and prepare data, train ML model
Software Engineer: integrate models in applications

How well did you know this?

Not at all

Perfectly

ML as a function

A ML model is a software application that encapsulates a function to calculate an output value based on one or more input values
Training = defining functions
Inferencing = predict new values

How well did you know this?

Not at all

Perfectly

Steps of Training and Inference

Data = past observations
- x = observed attributes / features
- y = known value of prediction / label
- x can be a vector of multiple features
Algorithm is applied to determine relationship between x and y
Result of algorithm is a model that encapsulates a calculation on x to calculate y
- calculation is a function y = f(x)
Trained model can be used for inference
- predictions are ŷ (y-hat)
- rained models are used to draw conclusions from new data

How well did you know this?

Not at all

Perfectly

Types of ML

Supervised ML
a) Regression
b) Classification
ba) binary classification
bb) multiclass classification
Unsupervised ML
a) Clustering

How well did you know this?

Not at all

Perfectly

Supervised ML

Training data with known features and values (= labeled dataset)
- most common type
- label can be anything from a category label to a real-valued number
- model learns a mapping between the input (features) and the output (label) during the training process
- once trained, model can predict the output for new, unseen data

How well did you know this?

Not at all

Perfectly

Common Examples for supervised ML

linear regression for regression problems
logistic regression for binary classification
decision trees
support vector machines for classification problems

How well did you know this?

Not at all

Perfectly

Unsupervised ML

Only features no known labels (= unlabeled dataset)
- Model finds patterns and relationships between features

How well did you know this?

Not at all

Perfectly

Common Examples of unsupervised ML

Clustering (grouping similar data points together)
Dimensionality reduction (reducing the number of random variables under consideration by obtaining a set of principal variables)
k-means for clustering problems
Principal Component Analysis (PCA) for dimensionality reduction problems

How well did you know this?

Not at all

Perfectly

Regression

Models are trained to predict numeric label values based on training data that includes both features and known labels
e.g. predicting ice-cream sales (y) based on temperature (x)

How well did you know this?

Not at all

Perfectly

Regression elements of training process

Split training data randomly (train and validate subset)
Use algorithm to fit data to a model
Validating by predicting values
compare actual labels to predictions
/ aggregate differences to calculate metric of accuracy

How well did you know this?

Not at all

Perfectly

Regression Evaluation Metrics

MAE (Mean Absolute Error)
MSE (Mean Squared Error)
RMSE (Root Mean Squared Error)
R2 (Coefficient of Determination)

How well did you know this?

Not at all

Perfectly

MAE

Mean absolute error
- Variance (by how many was each prediction wrong)
- doesn’t matter if + or -

How well did you know this?

Not at all

Perfectly

MSE

Mean squared error
- amplifies larger errors
- no longer represents quantity
- better to have a model that’s consistently slightly wrong than fewer but larger errors

How well did you know this?

Not at all

Perfectly

RMSE

Root mean squared error
- to represent quantity with squared error

How well did you know this?

Not at all

Perfectly

Coefficient of determination
- proportion of variance in validation results
- natural random variance opposed to anomalous aspect
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
ȳ = mean of actual value labels
- result between 0 and 1
- the closer to 1 the better the model is fitting the validation data

How well did you know this?

Not at all

Perfectly

Binary Classification

Study These Flashcards

Calculates probability values for class assignments
observed item is or isn’t an instance of a specific class
e.g. predicting Diabetes yes or no

Steps of Binary Classification

Study These Flashcards

Algorithm calculates probability values for class assignments
Evaluation metrics compare predicted to actual classes
Probability is measured as a value between 0.0 and 1.0
Function describes probability of y being true (=1) for a given value x (f(x) = P (y=1|x)

Classification evaluation metrics

Study These Flashcards

Confusion matrix
Accuracy
Recall (TPR true positive rate)
Precision
F1-Score
AUC (area under the curve)

Confusion Matrix

Study These Flashcards

Matrix of number of correct and incorrect predictions for each possible class label.
columns = ŷ ( 0 and 1)
rows = y (0 and 1)
TN (true negative): ŷ=0 and y=0
FP (false positive): ŷ=1 and y=0
FN (false negative): ŷ=0 and y=1
TP (true positive): ŷ=1 and y=1
The arrangement of the confusion matrix is such that correct (true) predictions are shown in a diagonal line from top-left to bottom-right

Accuracy

Study These Flashcards

Proportion of right predictions
(TN+TP) ÷ (TN+FN+FP+TP)

Recall

Study These Flashcards

TPR true positive rate
Measures proportion of positive cases identified correctly
TP ÷ (TP+FN)
e.g. compared to patients with Diabetes, how many were identified correctly

Precision

Study These Flashcards

Proportion of predicted positive cases where label is actually positive
TP ÷ (TP+FP)
e.g. what proportion of predicted positive cases actually have diabetes

F1- Score

Study These Flashcards

Combines recall and precision
(2 x Precision x Recall) ÷ (Precision + Recall)

AUC (Area under the curve)

Study These Flashcards

Plotting an ROC (received operator curve)
Shows all TPRs and FPRs (true/ false positive rate) for all thresholds (decision point on yes or no)
Where straight line goes into curve = AUC
e.g. if AUC is 0.875 –> works better than random guessing (over 0.5)

Multiclass Classification

Label represents one of multiple possible classes Mostly mutually exclusive (e.g. penguin species) but multilabel classifications are possible (e.g. movie genres)

Training-algorithms for a multiclass classification model

-OVR (One vs Rest) -Multinominal algorithms

OVR

One vs Rest Trains binary classification for each class Calculating the probability that the observation is an example of the target class

Multinominal algorithms

Single function that returns a multi valued output Output = Vector with probability distribution for all classes (all add up to 1)

Clustering

Identify similarities between observations No known cluster labels Can be used to create classes for multiclass Classification

Training Clustering Model

K-Means (most common) 1. Feature values vectorized to define n-dimensional coordinates 2. decide on k (= how many clusters), plot k at random coordinates (=centroids) 3. each data point is assigned to its nearest centroid 4. centroids moved to center of datapoints, based on mean distance between points 5. reassign data points to closest centroid 6. repeat 4. and 5. until stable or predetermined maximum iterations

Evaluating a Clustering Model

How well are clusters separated? - average distance to centroid - average distance to other centroids - maximum distance to centroid - silhouette (ratio of distance between points in same cluster and points in different clusters, value between -1 and 1. The closer to 1 the better the cluster separation)

Iterative Training

Vary: - feature selection and preparation - algorithm selection - algorithm parameters

Deep Learning

- tries to emulate how the brain learns - creation of artificial neural network that simulates electrochemical activity in biological neurons by using mathematical functions - each neuron = function, operates with input x and weight w - multiple layers of neurons (DNN deep neural network, deeply nested functions) - change w in iterations to reduce loss (ŷ compared to y) - Data is batched into Matrices and processed using linear algebraic calculations -> good GPUs (Graphics processing unit) needed

How does a neural network learn?

1. Training features are fed into the input layer (vector x) 2. Randomly assigned w , function combines x and w 3. If threshold is reached, activation function passes it on 4. each neuron is connected with each neuron in next layer (fully connected network) 5. Output layer produces a vector containing the calculated values for ŷ 6. Loss function to compare the predicted ŷ values to the known y values and aggregate the difference 7. Optimization function: use differential calculus to evaluate the influence of each weight on the loss, gradient descent to improve (each weight is in- or decreased) 8. New weights are backpropagated to the layers in the network, replacing the previously used values. 9. Multiple iterations (known as epochs) until the loss is minimized and the model predicts acceptably accurately

Azure Machine Learning

Cloud service for training, deploying, and managing machine learning models - Exploring data and preparing it for modeling. - Training and evaluating machine learning models. - Registering and managing trained models. - Deploying trained models for use by applications and services. -Reviewing and applying responsible AI principles and practices.

Fundamentals of ML Flashcards

(36 cards)