Week 4 Flashcards
Does linear regression work well for classification
No
It doesn’t output probabilities and treats classes as numbers, fitting the best hyperplane for a single feature.
It interpolates between data points and thus cannot be interpreted as probabilities, no meaningful threshold to distinguish classes.
Does a linear model extend to classification problems with multiple classes
No
Need to label the class and the classes might not have any meaningful order
The linear model will create a weird structure on the relationship between the features
and class predictions. The higher the value of a feature, the more it contributes to the
prediction of a class with a higher number, even if classes happen to get a similar
number are not closer than other classes.
What is a logistic regression?
Convert our predictions on whether it will or not to a probabilistic representation. Mapping of the data xi
to a variable zi. This is a simple linear predictive model.
The sigmoid function converts predictions to a probablistic perspective.
What is the output of logistic regression?
Binary
When does logistic regression perform well?
When data can be separated by a straight line.
MLP concepts
K latent features are sent to a logistic regression model to yield a binary probability for the classification of the data
Instead of doing logistic regression directly on the data we apply logistic regression on the K latent features.
What is Deep Learning
A form of ML where a model has multiple layers of latent processes
How do we train MLP weights
Weights start as random values so training process is stochastic.
w = w + (a x (expected - predicted) x input)
w = w + (a x � x input)
a is learning rate
input is variable values
What is the learning rate in a neural network
Dictates how quickly the model changes
The rate of change should slow as the model converges.
Gradient descent allows us to adjust rate of change based on changes in error
The rate of change for features on decision boundaries can change faster
Non-linear activation functions gain their power from the derivative.
Why does calculating weight have potential to create instable models?
When input variables are of different magnitudes their contribution to the weights are greater and need to be normalised.
How does the learning rate change?
For features on decision boundaries, the derivative of its sigmoid will be approx 1 rather than 0 for 0 or 1. Therefore we can use this to scale the learning rate for these features.
What is gradient descent?
Optimisation function used to find values (coefficients) of parameters of a function f that minimises a cost function j
(imagine changing y intercept and graphing the sum of squared errors as it goes down)
What ways do we evaluate model performance
Accuracy
Precision
Recall
F1-score
ROC curve and area under ROC curve
Difference between TP TN FP FN
- True positive (TP) is a correct classification, i.e., a hit.
True negative (TN) is a correct classification, i.e., a correct rejection.
False positive (FP) is when the outcome is incorrectly
predicted as positive when it is actually negative.
False negative (FN) is when the outcome is
incorrectly predicted as negative when it is actually positive.
What is accuracy
How often classifier predicts the class correctly.