Week 2: Discriminant Functions Flashcards
Discriminant Functions
They’re scalar functions that assign a feature vector to a class. They result in the feature space divided by decision boundaries into decision regions.
Dichotomiser
A discriminant function that can only assign feature vectors to 1 of 2 classes.
Linear Discriminant Functions
Discriminant functions whose decision boundaries are linear.
Augmented Vectors
Vectors that contain all parameters and features
Quadratic Discriminant Functions
Discriminant functions whose decision boundaries are defined by quadratic functions.
Linearly Separable Data
When the dataset has classes that can be cleanly divided by linear discriminant functions.
Sample Normalisation
If there are only two classes, multiply the samples in the negatively-labelled class by -1, so that samples can all be modified using the same mathematical and statistical operations.
Gradient Descent
The method of updating weights and parameters using a learning rate and gradient vector.
Batch Perceptron Learning
Loop through all samples in a batch, and only pay attention to the misclassified examples. The updated weights are then the old weights plus all the misclassified samples multiplied by eta.
Sequential Perceptron Learning
Go through each sample, updating the weights by the misclassified sample times eta. Keep iterating until the termination criteria is reached.
Multiclass Perceptron Learning Algorithm
If a sample is misclassified, move the weights of the actual class closer towards the sample and the weights of the other classes away from the sample.
Widrow-Hoff/LMS Learning
A type of gradient descent where the update function is the original weight plus eta times the difference of a margin value and the weights multiplied with the input, then multiplied with the input. This ensures that the relative accuracy of the weights in classifying the samples is factored in the update function.
k-Nearest Neighbours (k-NN) Classifier.
A way of classifying samples based on the majority class of the k closest samples to the specific sample in question.
Pros:
- No training time
- Classification accuracy high with large sample size
- Works with multi-modal data and non-linearly separable data
- Can use a proportion of neighbours in each class to estimate probability that a sample belongs to each class
Cons:
- Large storage requirements
- Computationally expensive
- Number of training samples required increases exponentially with dimensionality of feature space