Fundamentals of ML Flashcards

1
Q

What is ML?

A
  • teach a computer model to make predictions and draw conclusions from data
  • building computer systems that learn from data
  • ML algorithms are trained to find relationships and patterns in data
  • intersection of Data Science and Software Engineering
  • Data Scientist: explore and prepare data, train ML model
  • Software Engineer: integrate models in applications
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

ML as a function

A
  • A ML model is a software application that encapsulates a function to calculate an output value based on one or more input values
  • Training = defining functions
  • Inferencing = predict new values
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Steps of Training and Inference

A
  1. Data = past observations
    - x = observed attributes / features
    - y = known value of prediction / label
    - x can be a vector of multiple features
  2. Algorithm is applied to determine relationship between x and y
  3. Result of algorithm is a model that encapsulates a calculation on x to calculate y
    - calculation is a function y = f(x)
  4. Trained model can be used for inference
    - predictions are ŷ (y-hat)
    - rained models are used to draw conclusions from new data
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Types of ML

A
  1. Supervised ML
    a) Regression
    b) Classification
    ba) binary classification
    bb) multiclass classification
  2. Unsupervised ML
    a) Clustering
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Supervised ML

A

Training data with known features and values (= labeled dataset)
- most common type
- label can be anything from a category label to a real-valued number
- model learns a mapping between the input (features) and the output (label) during the training process
- once trained, model can predict the output for new, unseen data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Common Examples for supervised ML

A
  • linear regression for regression problems
  • logistic regression for binary classification
  • decision trees
  • support vector machines for classification problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Unsupervised ML

A

Only features no known labels (= unlabeled dataset)
- Model finds patterns and relationships between features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Common Examples of unsupervised ML

A
  • Clustering (grouping similar data points together)
  • Dimensionality reduction (reducing the number of random variables under consideration by obtaining a set of principal variables)
  • k-means for clustering problems
  • Principal Component Analysis (PCA) for dimensionality reduction problems
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Regression

A

Models are trained to predict numeric label values based on training data that includes both features and known labels
e.g. predicting ice-cream sales (y) based on temperature (x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Regression elements of training process

A
  1. Split training data randomly (train and validate subset)
  2. Use algorithm to fit data to a model
  3. Validating by predicting values
  4. compare actual labels to predictions
    / aggregate differences to calculate metric of accuracy
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Regression Evaluation Metrics

A
  • MAE (Mean Absolute Error)
  • MSE (Mean Squared Error)
  • RMSE (Root Mean Squared Error)
  • R2 (Coefficient of Determination)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

MAE

A

Mean absolute error
- Variance (by how many was each prediction wrong)
- doesn’t matter if + or -

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

MSE

A

Mean squared error
- amplifies larger errors
- no longer represents quantity
- better to have a model that’s consistently slightly wrong than fewer but larger errors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

RMSE

A

Root mean squared error
- to represent quantity with squared error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

R2

A

Coefficient of determination
- proportion of variance in validation results
- natural random variance opposed to anomalous aspect
R2 = 1- ∑(y-ŷ)2 ÷ ∑(y-ȳ)2
ȳ = mean of actual value labels
- result between 0 and 1
- the closer to 1 the better the model is fitting the validation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Binary Classification

A

Calculates probability values for class assignments
observed item is or isn’t an instance of a specific class
e.g. predicting Diabetes yes or no

17
Q

Steps of Binary Classification

A
  1. Algorithm calculates probability values for class assignments
  2. Evaluation metrics compare predicted to actual classes
  3. Probability is measured as a value between 0.0 and 1.0
  4. Function describes probability of y being true (=1) for a given value x (f(x) = P (y=1|x)
18
Q

Classification evaluation metrics

A
  • Confusion matrix
  • Accuracy
  • Recall (TPR true positive rate)
  • Precision
  • F1-Score
  • AUC (area under the curve)
19
Q

Confusion Matrix

A

Matrix of number of correct and incorrect predictions for each possible class label.
columns = ŷ ( 0 and 1)
rows = y (0 and 1)
TN (true negative): ŷ=0 and y=0
FP (false positive): ŷ=1 and y=0
FN (false negative): ŷ=0 and y=1
TP (true positive): ŷ=1 and y=1
The arrangement of the confusion matrix is such that correct (true) predictions are shown in a diagonal line from top-left to bottom-right

20
Q

Accuracy

A

Proportion of right predictions
(TN+TP) ÷ (TN+FN+FP+TP)

21
Q

Recall

A

TPR true positive rate
Measures proportion of positive cases identified correctly
TP ÷ (TP+FN)
e.g. compared to patients with Diabetes, how many were identified correctly

22
Q

Precision

A

Proportion of predicted positive cases where label is actually positive
TP ÷ (TP+FP)
e.g. what proportion of predicted positive cases actually have diabetes

23
Q

F1- Score

A

Combines recall and precision
(2 x Precision x Recall) ÷ (Precision + Recall)

24
Q

AUC (Area under the curve)

A

Plotting an ROC (received operator curve)
Shows all TPRs and FPRs (true/ false positive rate) for all thresholds (decision point on yes or no)
Where straight line goes into curve = AUC
e.g. if AUC is 0.875 –> works better than random guessing (over 0.5)

25
Q

Multiclass Classification

A

Label represents one of multiple possible classes
Mostly mutually exclusive (e.g. penguin species) but multilabel classifications are possible (e.g. movie genres)

26
Q

Training-algorithms for a multiclass classification model

A

-OVR (One vs Rest)
-Multinominal algorithms

27
Q

OVR

A

One vs Rest
Trains binary classification for each class
Calculating the probability that the observation is an example of the target class

28
Q

Multinominal algorithms

A

Single function that returns a multi valued output
Output = Vector with probability distribution for all classes (all add up to 1)

29
Q

Clustering

A

Identify similarities between observations
No known cluster labels
Can be used to create classes for multiclass Classification

30
Q

Training Clustering Model

A

K-Means (most common)
1. Feature values vectorized to define n-dimensional coordinates
2. decide on k (= how many clusters), plot k at random coordinates (=centroids)
3. each data point is assigned to its nearest centroid
4. centroids moved to center of datapoints, based on mean distance between points
5. reassign data points to closest centroid
6. repeat 4. and 5. until stable or predetermined maximum iterations

31
Q

Evaluating a Clustering Model

A

How well are clusters separated?
- average distance to centroid
- average distance to other centroids
- maximum distance to centroid
- silhouette (ratio of distance between points in same cluster and points in different clusters, value between -1 and 1. The closer to 1 the better the cluster separation)

32
Q

Iterative Training

A

Vary:
- feature selection and preparation
- algorithm selection
- algorithm parameters

33
Q

Deep Learning

A
  • tries to emulate how the brain learns
  • creation of artificial neural network that simulates electrochemical activity in biological neurons by using mathematical functions
  • each neuron = function, operates with input x and weight w
  • multiple layers of neurons (DNN deep neural network, deeply nested functions)
  • change w in iterations to reduce loss (ŷ compared to y)
  • Data is batched into Matrices and processed using linear algebraic calculations -> good GPUs (Graphics processing unit) needed
34
Q

How does a neural network learn?

A
  1. Training features are fed into the input layer (vector x)
  2. Randomly assigned w , function combines x and w
  3. If threshold is reached, activation function passes it on
  4. each neuron is connected with each neuron in next layer (fully connected network)
  5. Output layer produces a vector containing the calculated values for ŷ
  6. Loss function to compare the predicted ŷ values to the known y values and aggregate the difference
  7. Optimization function: use differential calculus to evaluate the influence of each weight on the loss, gradient descent to improve (each weight is in- or decreased)
  8. New weights are backpropagated to the layers in the network, replacing the previously used values.
  9. Multiple iterations (known as epochs) until the loss is minimized and the model predicts acceptably accurately
35
Q

Azure Machine Learning

A

Cloud service for training, deploying, and managing machine learning models
- Exploring data and preparing it for modeling.
- Training and evaluating machine learning models.
- Registering and managing trained models.
- Deploying trained models for use by applications and services.
-Reviewing and applying responsible AI principles and practices.

36
Q
A