2019 A2 Flashcards

Question 1

Q

Provide a short description of the K-means algorithm.

Answer

A

K-means in an unsupervised learning algorithm that aims to assign each observation to an appropriate class by minimising the average squared distance between the observation points and their nearest cluster centers.

Question 2

Q

Add a data preprocessing step to the k-means algorithm which will allow it to produce the
clustering result.

Answer

A

Whitening

Question 3

Q

Which probability distributions are modelled after generative and discriminative models?

Answer

A

Generative - joint probability P(X,Y)
Discriminative - conditional probability P(X|Y)

Question 4

Q

Give an example of generative and discriminative models

Answer

A

Generative - Naive Bayes’
Discriminative - logistic regression

Question 5

Q

What are the three algorithms used in HMMs?

Answer

A

Forward Algorithm
Viterbi algorithm
Forward-backward algorithm

Question 6

Q

What do K-means, Gaussian Mixture and HMMS have in common?

Answer

A

They are all unsupervised learning techniques.

Question 7

Q

How do you determine convergence in EM algorithm?

Answer

A

One calculates the negative log-likelihood after each repetition of the E and M steps and compare it to its previous value. If the difference is less than a specified threshold, we consider the parameters converged.

Question 8

Q

Why is it preferable for likelihood code to represent quantities in log-domain?

Answer

A

When multiplying long sequences of probabilities with the values between 0 and 1, the product becomes smaller and smaller over time, resulting in numerical underflow.
Therefore, a log-scale is used and the log-sum trick solves the problem of underflow through summation. The log ensures that the values don’t necessarily lie between 0 and 1 due to the natural log, so they won’t get smaller over time when being multiplied together.

Question 9

Q

It is common practice in machine learning to partition data into 3 portions for analysis. Specify the name of these portions and the role of each in obtaining a final fitted classifier as well as an estimate of the quality of the classifier.

Answer

A

Training set - used for training the model.
Validation set - used to provide an unbiased estimate of the model fit and estimate hyperparamets and the amount of regularisation.
Test set - used to provide an unbiased estimate of the final model, note that the model should not be tuned any further when used on the test set.

Question 10

Q

It is common practice in machine learning to partition data into 3 portions for analysis. Specify the name of these portions and the role of each in obtaining a final fitted classifier as well as an estimate of the quality of the classifier.

Answer

A

Training set - used for training the model.
Validation set - used to provide an unbiased estimate of the model fit and estimate hyperparamets and the amount of regularisation.
Test set - used to provide an unbiased estimate of the final model, note that the model should not be tuned any further when used on the test set.

2019 A2 Flashcards

(10 cards)