Week 4 Flashcards

1
Q

Does linear regression work well for classification

A

No
It doesn’t output probabilities and treats classes as numbers, fitting the best hyperplane for a single feature.
It interpolates between data points and thus cannot be interpreted as probabilities, no meaningful threshold to distinguish classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Does a linear model extend to classification problems with multiple classes

A

No
Need to label the class and the classes might not have any meaningful order
The linear model will create a weird structure on the relationship between the features
and class predictions. The higher the value of a feature, the more it contributes to the
prediction of a class with a higher number, even if classes happen to get a similar
number are not closer than other classes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a logistic regression?

A

Convert our predictions on whether it will or not to a probabilistic representation. Mapping of the data xi
to a variable zi. This is a simple linear predictive model.
The sigmoid function converts predictions to a probablistic perspective.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the output of logistic regression?

A

Binary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When does logistic regression perform well?

A

When data can be separated by a straight line.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

MLP concepts

A

K latent features are sent to a logistic regression model to yield a binary probability for the classification of the data
Instead of doing logistic regression directly on the data we apply logistic regression on the K latent features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is Deep Learning

A

A form of ML where a model has multiple layers of latent processes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we train MLP weights

A

Weights start as random values so training process is stochastic.

w = w + (a x (expected - predicted) x input)
w = w + (a x � x input)

a is learning rate
input is variable values

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the learning rate in a neural network

A

Dictates how quickly the model changes
The rate of change should slow as the model converges.
Gradient descent allows us to adjust rate of change based on changes in error
The rate of change for features on decision boundaries can change faster
Non-linear activation functions gain their power from the derivative.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Why does calculating weight have potential to create instable models?

A

When input variables are of different magnitudes their contribution to the weights are greater and need to be normalised.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How does the learning rate change?

A

For features on decision boundaries, the derivative of its sigmoid will be approx 1 rather than 0 for 0 or 1. Therefore we can use this to scale the learning rate for these features.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is gradient descent?

A

Optimisation function used to find values (coefficients) of parameters of a function f that minimises a cost function j

(imagine changing y intercept and graphing the sum of squared errors as it goes down)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What ways do we evaluate model performance

A

Accuracy
Precision
Recall
F1-score
ROC curve and area under ROC curve

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Difference between TP TN FP FN

A
  • True positive (TP) is a correct classification, i.e., a hit.
    True negative (TN) is a correct classification, i.e., a correct rejection.
    False positive (FP) is when the outcome is incorrectly
    predicted as positive when it is actually negative.
    False negative (FN) is when the outcome is
    incorrectly predicted as negative when it is actually positive.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is accuracy

A

How often classifier predicts the class correctly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is precision?

A

Measure of result relevancy. Ratio of the true positives to the true positives + false positives.

17
Q

What is recall?

A

How many truly relevant results are returned. Ratio of true positives over true positives plus false negatives.

18
Q

What is F1 score?

A

Harmonic mean of precision and recall

2 * Precision*Recall / Precision+Recall

19
Q

What is ROC curve

A

A ROC curve shows the performance of a classifier at all classification thresholds

To plot all points on ROC curve you need to evaluate your classifier many times at different classification thresholds

20
Q

What does the area under ROC curve tell us

A

The Area under the ROC Curve (AUC) uses an efficient sorting algorithm to provide information to tell us how much our classifier is capable of distinguishing between
classes. For a binary classification problem, the higher the AUC, the
better our classifier is at separating 0 classes as 0, 1 classes as 1.