[1] Machine Learning Fundamentals Flashcards

1
Q

What are the stages of the machine learning lifecycle?

A

(1) Process data
(2) Split the data
(3) Training
(4) Test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does the ‘process data’ stage of the ML lifecycle work?

A

Data is put into a machine-readable format and undergoes feature engineering and/or dimensionality reduction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How does the ‘split data’ stage of the ML lifecycle work?

A

Data is separated into the training data to train the weights, validation data to guide the training process, and testing data to evaluate the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How does the ‘training’ stage of the ML lifecycle work?

A

The training data is used directly to train the model parameters, guided by the validation data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the ‘test’ stage of the ML lifecycle work?

A

The test data is used to evaluate how well the model is likely to perform in the real world

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What kinds of summary statistics are considered during EDA?

A

Overall statistics - these describe the overall dataset e.g. how many instances and features

Attribute statistics - describe individual features i.e. their average

Multivariate statistics - describe relationships between features

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between semantic segmentation and instance segmentation?

A

Semantic segmentation classifies pixels while instance segmentations finds distinct objets of that class as pixel groups

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is unsupervised learning useful for finding relationships within the data?

A

It doesn’t require knowing the classes in the dataset up-front

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of regularisation?

A

It de-sensitises the model to the data, allowing it to avoid overfitting and better handle outliers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the key kinds of regularisation?

A

L1 regularisation is Lasso regression

L2 regularisation is Ridge regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does ‘stochastic batch learning’ refer to?

A

Using only 1 sample in each batch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is cross-validation?

A

Which data is used for training and validation is rotated, preventing data from being lost in the training phase

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How should features be selected?

A
  • Use domain knowledge to drop irrelevant information
  • Drop features with low correlation to the response (but be careful fo correlations)
  • Drop features with very low or very high variance
  • Drop features with lots of missing values or errors, unless this is relevant
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What steps are there to feature engineering?

A
  • Simplify features i.e. give MBI instead of height and weight
  • Standardise the scale of the data to [0, 1]
  • Transform the features to suit the problem i.e. conversion timestamps to time of day
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How can unbalanced data be addressed?

A
  • Source more data
  • Oversample minority data or weight it more strongly
  • Synthesise new data - consider what can be varied without changing the label
  • Try different algorithms - some are less susceptible to missing data than others
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why is is important to always do before splitting?

A

Shuffle the data to prevent data clumping etc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

How you categorical features be encoded?

A

Label encoding with a look-up table if they are ordinal, otherwise one-hot encoding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

How is dimensionality reduction performed?

A

PCA or t-distributed stochastic neighbour encoding

Note: clustering does NOT help - it is unsupervised

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the steps for performing PCA?

A

(1) Find the centroid
(2) Draw a minimum bounding box such that none of its sides are parallel to the axes
(3) take the longest diagonal as the biggest variance (PC1) and the second longest as PC2 and so on

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How many components are used after performing PCA?

A

Generally either a fixed number of components are used or as many as is need so that x% of the characteristics of the dataset is represented

21
Q

What is logistical regression?

A

A supervised binary classifier which fits a sigmoid to the data with a horizontal asymptote (making it less susceptible to outliers as it mostly focuses on the cutoff point)

The cut-off value van be configured from 0.5 to balance sensitivity and specificity

22
Q

What is linear regression?

A

A supervised numeric regressor which fits a straight line to the data, generally to the least sum of squares

It cannot represent interactions unless additional terms are added

23
Q

What are support vector machines?

A

A supervised classification algorithm which finds support vectors whose hyperplanes divide the data with the greatest margins

24
Q

What are decision trees?

A

Supervised algorithms that can be used for binary, numeric or classification problems

They start at a root node and find splits to build internal nodes and eventually leaf nodes

25
Q

What is notable about the splits in decision trees?

A

They are always binary i.e. have two options

26
Q

What are random forests?

A

An ensemble of decision trees with voting used to decide the output.

Each split is based on a random subset of features to ensure diversity

27
Q

What is KNN?

A

A SUPERVISED algorithm where points are classified based on its nearest neighbours

This is a lazy algorithm - there is no training time

28
Q

What is k means?

A

An unsupervised classification algorithm where k clusters are automatically found?

29
Q

What are the steps for k means?

A

(1) Randomly define clusters based on a centroid
(2) Iteratively improved the results by moving the centroids towards the centre of the points in their clusters
(3) Try multiple random starting point and select the one with the least variation

30
Q

How is the quality of clusters in k means assessed?

A

The lowest variation is best

31
Q

How should the number of clusters for k means be determined?

A

With an elbow plot - plot the number of clusters on the x axis and the reduction in variation on the y

Look for the elbow point where the graph goes from exponential to linear i.e. there are diminishing returns

32
Q

What is LDA?

A

Latent Dirchlet Allocation - a supervised algorithm used for the classification etc. of text

33
Q

What assumptions does LDA make?

A

Documents are probability distributions over latent topics

Topics are probability distributions over words

34
Q

What is forward propagation?

A

The weights (including biases) is used to production and output form the inputs i.e. perform inference

35
Q

What is back propagation?

A

The weights are trained based on a loss function of how well they matched a desired output during forward propagation

36
Q

What is an epoch in the case of neural networks?

A

A full cycle of forward propagation, back propagation, computing the loss function and updating the weights over ALL of the data

37
Q

What are the main kinds of neural network architectures?

A
  • Dense neural networks
  • Convolutional neural networks
  • Recurrent neural networks have a form of memory, allowing them to be used on sequences

Note: LSTMS have some memory, RNNs have a lot

38
Q

How are confusion matrices constructed?

A

The correct class is on the x axis, and the prediction is on the y axis

39
Q

What is sensitivity also known as?

A

Recall

40
Q

What is recall also known as?

A

Sensitivity

41
Q

What is sensitivity/recall?

A

The True Positive Rate (TPR) - portions of true positives that are correctly classified

TPR = TP/(TP+FN)

42
Q

What is specificity?

A

The True Negative Rate (TNR) - the portion of true negatives that are correctly classified

TNR = TN / (TN + FP)

43
Q

What is TPR also know as?

A

Sensitivity or Recall

44
Q

What is TNR also known as?

A

Specificity

45
Q

What is accuracy?

A

The portion of all predictions that are correct

Acc = TP + TN / TOTAL

46
Q

What is precision?

A

The portion of positive predictions that are actually positive

Pre. = TP / (TP+FP)

Note: this requires that the problem is framed with a clear positive case

47
Q

What is the F1 score?

A

A balanced measure of a BINARY classifier’s performance based on the sensitivity and recall

F1 = 2 * Recall * Precision / (Recall + Precision)

48
Q

What are the axes for a ROC curve?

A

TPR on the x-axis, FPR on the y axis

49
Q

What is Gini impurity?

A

A measure of how often a randomly selected element would be labelled incorrectly if it was classified randomly

It is used to evaluate splits in decision trees - the split with the lowest average Gini impurity (weighted by instance count) on the leaf nodes is selected