[1] Machine Learning Fundamentals Flashcards by Marrick Lip

What are the stages of the machine learning lifecycle?

(1) Process data
(2) Split the data
(3) Training
(4) Test

How well did you know this?

Not at all

Perfectly

How does the ‘process data’ stage of the ML lifecycle work?

Data is put into a machine-readable format and undergoes feature engineering and/or dimensionality reduction

How well did you know this?

Not at all

Perfectly

How does the ‘split data’ stage of the ML lifecycle work?

Data is separated into the training data to train the weights, validation data to guide the training process, and testing data to evaluate the model

How well did you know this?

Not at all

Perfectly

How does the ‘training’ stage of the ML lifecycle work?

The training data is used directly to train the model parameters, guided by the validation data

How well did you know this?

Not at all

Perfectly

How does the ‘test’ stage of the ML lifecycle work?

The test data is used to evaluate how well the model is likely to perform in the real world

How well did you know this?

Not at all

Perfectly

What kinds of summary statistics are considered during EDA?

Overall statistics - these describe the overall dataset e.g. how many instances and features

Attribute statistics - describe individual features i.e. their average

Multivariate statistics - describe relationships between features

How well did you know this?

Not at all

Perfectly

What is the difference between semantic segmentation and instance segmentation?

Semantic segmentation classifies pixels while instance segmentations finds distinct objets of that class as pixel groups

How well did you know this?

Not at all

Perfectly

Why is unsupervised learning useful for finding relationships within the data?

It doesn’t require knowing the classes in the dataset up-front

How well did you know this?

Not at all

Perfectly

What is the purpose of regularisation?

It de-sensitises the model to the data, allowing it to avoid overfitting and better handle outliers

How well did you know this?

Not at all

Perfectly

What are the key kinds of regularisation?

L1 regularisation is Lasso regression

L2 regularisation is Ridge regression

How well did you know this?

Not at all

Perfectly

What does ‘stochastic batch learning’ refer to?

Using only 1 sample in each batch

How well did you know this?

Not at all

Perfectly

What is cross-validation?

Which data is used for training and validation is rotated, preventing data from being lost in the training phase

How well did you know this?

Not at all

Perfectly

How should features be selected?

Use domain knowledge to drop irrelevant information
Drop features with low correlation to the response (but be careful fo correlations)
Drop features with very low or very high variance
Drop features with lots of missing values or errors, unless this is relevant

How well did you know this?

Not at all

Perfectly

What steps are there to feature engineering?

Simplify features i.e. give MBI instead of height and weight
Standardise the scale of the data to [0, 1]
Transform the features to suit the problem i.e. conversion timestamps to time of day

How well did you know this?

Not at all

Perfectly

How can unbalanced data be addressed?

Source more data
Oversample minority data or weight it more strongly
Synthesise new data - consider what can be varied without changing the label
Try different algorithms - some are less susceptible to missing data than others

How well did you know this?

Not at all

Perfectly

Why is is important to always do before splitting?

Shuffle the data to prevent data clumping etc.

How well did you know this?

Not at all

Perfectly

How you categorical features be encoded?

Label encoding with a look-up table if they are ordinal, otherwise one-hot encoding

How well did you know this?

Not at all

Perfectly

How is dimensionality reduction performed?

PCA or t-distributed stochastic neighbour encoding

Note: clustering does NOT help - it is unsupervised

How well did you know this?

Not at all

Perfectly

What are the steps for performing PCA?

(1) Find the centroid
(2) Draw a minimum bounding box such that none of its sides are parallel to the axes
(3) take the longest diagonal as the biggest variance (PC1) and the second longest as PC2 and so on

How well did you know this?

Not at all

Perfectly

How many components are used after performing PCA?

Study These Flashcards

Generally either a fixed number of components are used or as many as is need so that x% of the characteristics of the dataset is represented

What is logistical regression?

Study These Flashcards

A supervised binary classifier which fits a sigmoid to the data with a horizontal asymptote (making it less susceptible to outliers as it mostly focuses on the cutoff point)

The cut-off value van be configured from 0.5 to balance sensitivity and specificity

What is linear regression?

Study These Flashcards

A supervised numeric regressor which fits a straight line to the data, generally to the least sum of squares

It cannot represent interactions unless additional terms are added

What are support vector machines?

Study These Flashcards

A supervised classification algorithm which finds support vectors whose hyperplanes divide the data with the greatest margins

What are decision trees?

Study These Flashcards

Supervised algorithms that can be used for binary, numeric or classification problems

They start at a root node and find splits to build internal nodes and eventually leaf nodes

What is notable about the splits in decision trees?

They are always binary i.e. have two options

What are random forests?

An ensemble of decision trees with voting used to decide the output. Each split is based on a random subset of features to ensure diversity

What is KNN?

A SUPERVISED algorithm where points are classified based on its nearest neighbours This is a lazy algorithm - there is no training time

What is k means?

An unsupervised classification algorithm where k clusters are automatically found?

What are the steps for k means?

(1) Randomly define clusters based on a centroid (2) Iteratively improved the results by moving the centroids towards the centre of the points in their clusters (3) Try multiple random starting point and select the one with the least variation

How is the quality of clusters in k means assessed?

The lowest variation is best

How should the number of clusters for k means be determined?

With an elbow plot - plot the number of clusters on the x axis and the reduction in variation on the y Look for the elbow point where the graph goes from exponential to linear i.e. there are diminishing returns

What is LDA?

Latent Dirchlet Allocation - a supervised algorithm used for the classification etc. of text

What assumptions does LDA make?

Documents are probability distributions over latent topics Topics are probability distributions over words

What is forward propagation?

The weights (including biases) is used to production and output form the inputs i.e. perform inference

What is back propagation?

The weights are trained based on a loss function of how well they matched a desired output during forward propagation

What is an epoch in the case of neural networks?

A full cycle of forward propagation, back propagation, computing the loss function and updating the weights over ALL of the data

What are the main kinds of neural network architectures?

- Dense neural networks - Convolutional neural networks - Recurrent neural networks have a form of memory, allowing them to be used on sequences Note: LSTMS have some memory, RNNs have a lot

How are confusion matrices constructed?

The correct class is on the x axis, and the prediction is on the y axis

What is sensitivity also known as?

Recall

What is recall also known as?

Sensitivity

What is sensitivity/recall?

The True Positive Rate (TPR) - portions of true positives that are correctly classified TPR = TP/(TP+FN)

What is specificity?

The True Negative Rate (TNR) - the portion of true negatives that are correctly classified TNR = TN / (TN + FP)

What is TPR also know as?

Sensitivity or Recall

What is TNR also known as?

Specificity

What is accuracy?

The portion of all predictions that are correct Acc = TP + TN / TOTAL

What is precision?

The portion of positive predictions that are actually positive Pre. = TP / (TP+FP) Note: this requires that the problem is framed with a clear positive case

What is the F1 score?

A balanced measure of a BINARY classifier's performance based on the sensitivity and recall F1 = 2 * Recall * Precision / (Recall + Precision)

What are the axes for a ROC curve?

TPR on the x-axis, FPR on the y axis

What is Gini impurity?

A measure of how often a randomly selected element would be labelled incorrectly if it was classified randomly It is used to evaluate splits in decision trees - the split with the lowest average Gini impurity (weighted by instance count) on the leaf nodes is selected

[1] Machine Learning Fundamentals Flashcards

(49 cards)