Week 4 Flashcards

Question 1

Q

Does linear regression work well for classification

Answer

A

No
It doesn’t output probabilities and treats classes as numbers, fitting the best hyperplane for a single feature.
It interpolates between data points and thus cannot be interpreted as probabilities, no meaningful threshold to distinguish classes.

Question 2

Q

Does a linear model extend to classification problems with multiple classes

Answer

A

No
Need to label the class and the classes might not have any meaningful order
The linear model will create a weird structure on the relationship between the features
and class predictions. The higher the value of a feature, the more it contributes to the
prediction of a class with a higher number, even if classes happen to get a similar
number are not closer than other classes.

Question 3

Q

What is a logistic regression?

Answer

A

Convert our predictions on whether it will or not to a probabilistic representation. Mapping of the data xi
to a variable zi. This is a simple linear predictive model.
The sigmoid function converts predictions to a probablistic perspective.

Question 4

Q

What is the output of logistic regression?

Question 5

Q

When does logistic regression perform well?

Answer

A

When data can be separated by a straight line.

Question 6

Q

MLP concepts

Answer

A

K latent features are sent to a logistic regression model to yield a binary probability for the classification of the data
Instead of doing logistic regression directly on the data we apply logistic regression on the K latent features.

Question 7

Q

What is Deep Learning

Answer

A

A form of ML where a model has multiple layers of latent processes

Question 8

Q

How do we train MLP weights

Answer

A

Weights start as random values so training process is stochastic.

w = w + (a x (expected - predicted) x input)
w = w + (a x � x input)

a is learning rate
input is variable values

Question 9

Q

What is the learning rate in a neural network

Answer

A

Dictates how quickly the model changes
The rate of change should slow as the model converges.
Gradient descent allows us to adjust rate of change based on changes in error
The rate of change for features on decision boundaries can change faster
Non-linear activation functions gain their power from the derivative.

Question 10

Q

Why does calculating weight have potential to create instable models?

Answer

A

When input variables are of different magnitudes their contribution to the weights are greater and need to be normalised.

Question 11

Q

How does the learning rate change?

Answer

A

For features on decision boundaries, the derivative of its sigmoid will be approx 1 rather than 0 for 0 or 1. Therefore we can use this to scale the learning rate for these features.

Question 12

Q

What is gradient descent?

Answer

A

Optimisation function used to find values (coefficients) of parameters of a function f that minimises a cost function j

(imagine changing y intercept and graphing the sum of squared errors as it goes down)

Question 13

Q

What ways do we evaluate model performance

Answer

A

Accuracy
Precision
Recall
F1-score
ROC curve and area under ROC curve

Question 14

Q

Difference between TP TN FP FN

Answer

A

True positive (TP) is a correct classification, i.e., a hit.
True negative (TN) is a correct classification, i.e., a correct rejection.
False positive (FP) is when the outcome is incorrectly
predicted as positive when it is actually negative.
False negative (FN) is when the outcome is
incorrectly predicted as negative when it is actually positive.

Question 15

Q

What is accuracy

Answer

A

How often classifier predicts the class correctly.

Question 16

Q

What is precision?

Answer

Study These Flashcards

A

Measure of result relevancy. Ratio of the true positives to the true positives + false positives.

Question 17

Q

What is recall?

Answer

Study These Flashcards

A

How many truly relevant results are returned. Ratio of true positives over true positives plus false negatives.

Question 18

Q

What is F1 score?

Answer

Study These Flashcards

A

Harmonic mean of precision and recall

2 * Precision*Recall / Precision+Recall

Question 19

Q

What is ROC curve

Answer

Study These Flashcards

A

A ROC curve shows the performance of a classifier at all classification thresholds

To plot all points on ROC curve you need to evaluate your classifier many times at different classification thresholds

Question 20

Q

What does the area under ROC curve tell us

Answer

Study These Flashcards

A

The Area under the ROC Curve (AUC) uses an efficient sorting algorithm to provide information to tell us how much our classifier is capable of distinguishing between
classes. For a binary classification problem, the higher the AUC, the
better our classifier is at separating 0 classes as 0, 1 classes as 1.

Week 4 Flashcards

(20 cards)