L6 CLASSIFICATION SUM Flashcards
Features, features space, what makes a good feature?
Good features are measurable quantities that are compact and have good discriminatory power. There is no standard set of features for a certain classification problem, but we have shown some typical commonly used features.
The components of a classification system
Any classification system has the following components: a sensing module, a preprocessing mechanism, a feature extraction mechanism, a classifier and if the classifier uses supervised learning, then a set of already classified examples (training set). We handled the first components in the previous chapters. In this chapter we show some good practice methods to extract features and to choose a classifier for image and speech recognition. This means 011to solve the classification problem
Supervised vs unsupervised learning classifiers
Supervised classification relies on having a set of examples (feature vectors) whose true class is known. The outputs, called targets are also known. It is like having a teacher to train us by showing examples. These examples are called training patterns (training feature vectors).
Unsupervised classification does not rely on possession of examples from a known class. The system does not have a teacher with correct answers. Instead it tries to identify similarities between the inputs so that inputs that have something in common are categorised together.
Supervised classification has two phases: the training and the recognition. We have shown how to use template matching, neural networks, Naïve bayes, Bayes networks and HMM as supervised classifiers.
Features for image classification (just enumerate a few)
Optical Character Recognition (OCR), Zoning, single shape descriptors, statistical moments. For image processing: single shape descriptors, statistical moments and zoning
● Note: the formulas for statistical moments are only for interested people, you do not have to memorize them.
Features for speech recognition (just enumerate a few)
For speech recognition: the frequency spectrum, spectrograms, frequency of formants for vowels and the most widely used features, the Mel Frequency Cepstrum Coefficients (MFCC).
● Note: The formula for energy spectrum, cepstrum coefficients and mel cepstrum coefficient don’t have to be memorized. You should know that such coefficients exist and are good for classification.
Rule based classifiers (principles, examples)
Principle: based on a set of simple conditional if-then rules to separate between classes. This approach will work fine for two or three features, with more this classifier is not very efficient. Example:
Template matching (principles, distance, examples, advantages, disadvantages)
Principles: each class is represented by a single reference pattern or template. Templates for all classes are stored in a database. Template matching searched the database for the reference pattern that is most “similar” to the given test pattern.
Distance:
Examples: Template matching can be used for automatic speech recognition The system needs a collection of words that it can recognize, called lexicon or vocabulary. Template matching can be used for pedestrian recognition from video images in autonomous, smart cars. Templates of pedestrians’ images are created from training data and then recognized in street scene video images.
Advantages: Simple and effective method of classification
Disadvantages: Very computationally intensive, less simple when the inputs and the templates do not have the same size or when other distortions of the characters occur.
Neural networks (what are they, a diagram, how do they work as classifiers, an example)
What are they: A neural network (NN) is a computer learning system, inspired from biology
A diagram:
How do they work as classifiers:
Example:
Types of neural networks T
he simplest neural network is a single-layer one.
- One layer of inputs, and one layer of outputs, called a (single-layer) perceptron.
- The input layer just passes the signals on
- The output layer consists of McCulloch and Pitts neurons, which actually perform the calculations.
The multilayer perceptron (MLP) has the same structure as a single-layer perceptron, but features one or more intermediate hidden layers.
- The hidden layer has no direct connection to the outside world, contrary to the input and output layers.
How perceptron works:
- Lists several opinions, which are combined together, and eventually takes a yes/no decision, based on these opinions.
- Each opinion at the input is weighted, where the weights denote how important each input is.
The essence of Bayes’ approach.
The essence of the Bayesian approach is to provide a mathematical rule explaining how should our existing beliefs, P(B) change in the light of new evidence A. Usually the probabilities P(A|B), P(B) and P(A) in (1) are known and P(B|A) is not known and can be calculated by using the Bayes rule.
● You should know how to calculate P(A|B) using Bayes rule.
P(A|B) = P(B|A) * P(A) / P(B)
A, B = events
P(A|B) = probability of A given B is true
P(B|A) = probability of B given A is true
P(A), P(B) = independent probabilities of A and B
Naïve bayes classifier – principle
Principle: Bayesian classifiers in general try to infer the conditional probability of the class given the observed values of the input variables (features vector). For example, the probability that it is a bird knowing that it is yellow and it can fly.
Hidden Markov Models – how do they work as classifiers?
How do they work as classifiers: HMM is a graphical probabilistic reasoning algorithm that works well on a set of temporal data. At each clock tick, the system moves to a new state, which can be the same as the previous one or not. The HMM power comes from the fact that it deals with situation where you do not know the state you are in (the state is not visible, it is hidden), instead, you see some observations that depend on the state. Given the observations, the model helps to deduce the states.
Unsupervised learning – clustering – principle
Clustering:
Principle:
Unsupervised learning is the case where there is no information about the correct outputs available at all, so we need to find something else to drive the learning. An example of unsupervised learning is the is the k‐means clustering algorithm:
The k‐means clustering algorithm attempts to split a given data set into a fixed number (k) of clusters (similar inputs). The question is how to determine which data points belong to cluster 1 and which belong to the other one.
The k‐means clustering classification is an iterative process that works as follows:
- Initialization: randomly choose k positions in the input space. Assign the cluster centers(centroids) to these positions.
- Learning: repeat!
o For each data point xi: ▪ Compute the distance to each cluster center! ▪ Assign the data point to the nearest cluster center o For each cluster center: ▪ Move the center to the mean of the points in the cluster until the cluster center stops moving - Usage (classification): for each test point, compute the distance to each cluster center and assign the data point to the nearest cluster center
Reinforcement learning ‐ principle
Principle:
Classifiers evaluation. Confusion matrix – what it is & example, TP, TN, FP, FN. What is an ROC graph?
Classifiers evaluation:
To test the accuracy of classifiers we can use the Train and Test method, this takes a data set and uses a large quantity ~70% for training and the remaining for testing. After testing we look at the amount of errors and calculate the classification errors. We can also test this in a more refined way and use 4 possibilities as outcome:
- True positive (TP)
‐ an instance of class A is correctly classified as class A
- True negatives (TN) – an instance from class A is not classified in A
- False positives (FP) – an instance that is not from A is classified as A
- False negatives (FN) ‐ an instance that is from A was classified as not from A
Confusion matrix:
What is it?
A confusion matrix summarises the results of the qualification test, after making a confusion matrix we can define performance metrics:
*Accuracy defined as the percent of correct classifications: a=(TP+TN)/N,! *Precision defined as p=TP/(TP+FP)!
*Recall defined as r=TP/(TP+FN)!
*True positive rate (hit rate) = TP/P = the proportion of positive instances that are correctly classified as positive!
*False positive rate (False Alarm Rate) = FP/N is the negative instances that are erroneously classified as positive
A confusion matrix summarizes the results of the qualification test, after making a confusion matrix we can define performance metrics:
- Accuracy defined as the percent of correct classifications: a=(TP+TN)/N
- Precision defined as p=TP/(TP+FP)
- Recall defined as r=TP/(TP+FN)
- True positive rate (hit rate) = TP/P = the proportion of positive instances that are correctly classified as positive
- False positive rate (False Alarm Rate) = FP/N is the negative instances that are erroneously classified as positive
Example: ..
Another way to present the results of a classifier performance measurement or compare more classifiers are the Receiver Operating Characteristics (ROC) graphs. ROC have long been used in signal detection theory to depict the tradeoff between hit technique for organizing classifiers and visualizing their performance. An ROC graph shows the classifiers performance in 2D with FP on x-axis and TP on the y-axis. Each classifier is represented by one paint on the graph.