5. View-Based Recognition Flashcards
Describe the Naive View-Based search/recognition (using templates).
We have objects which we want to find in images and we have to collect a large collection of these objects from different angles, light intensity.. We search the given image with every single image from our collection by sliding window and checking where our objects matches the best (and which one).
If we assume we convert our training images, as well as the sliding window of the subject image to vectors (using lexicographic ordering), we can compare this high-dimensional vector. For example, we can find the angle between the vectors
This can be represented as normalized correlation (convolution). If we do normal convolution, we would be sensitive to light intensity. If we normalize it by the length of the vectors
How can we calculate the similarity between two vectors?
- SSD: Sum of Squared Distances (distance) but it is sensitive to light levels because if one vector has a bigger magnitude, it just means that it is amplified
- ## measure the cosine of the angle (not sensitive to the light levels)
What is the idea of Subspace methods for recognition?
Images are not random. Similar images have similar features and the idea is that a large corpus of similar images (like of digit 3) can be represented in a lower-dimention subspace that capture most of the variance of those images.
- We find a mean image (the most average image) and then some basis images which are low-dimension vectors which can be used to transform the average image into other images in the same space. The hope is that the basis images capture an angle of the number, or other features.
What does it mean when we are restricted to linear filters only?
That some vector a can be transformed into a vector b by
a is D dimensional, b is M dimensional
a = Xb
X has to be DxM dimensional matrix
How to perform the PCA?
- subtract all images with the mean to center them around 0 (mean would be zero, makes calculations much easier)
- Compute the covariance matrix of the data
- Find the eigenvalues and eigenvectors of the matrix
- Find the largest eigenvalue and the eigenvector that corresponds to it is the direction of the highest variance. (the best direction to project onto)
- That eigenvalue is the variance in this direction
If we want B-number of basis images, we take B-number of biggest eigenvalue-vectors.
What does PCA assume?
- Assuming zero mean, minimize the error of the projection, we have to maximize the variance (sum of negative variances). Zero mean because we subtract the mean from all data points in the beginning.
In eigen decomposition of the covariance matrix, we assume that the matrix is real, symmetric, and positive-definite.
Basis vectors/images are of norm=1 ||u|| = 1 and are orthonormal to each other.
How to make the PCA more efficient?
Short: I do the SVD of my data matrix after subtracting the mean from it. Then, I get the eigenvalues and eigenvectors of my covariance matrix easier (for free), without storing the full covariance matrix. Data matrix is much smaller than covariance matrix.
Even more, we have to only consider first N left-singular vectors because the rest are zeroes (if we have 1000 images, the first 1000 eigenvalues are taken, the rest are zero). Even firther, we usually need fewer than 1000 eigen vectors (only D of them)
Instead of finding a covariance matrix (which is a very large matrix, MxM where N is the size of my data points / images), I can do the SVD of the mean-subtracted data matrix (MxN where N is number of data points) (data matrix after subtracting the mean)
X - data vectors in matrix form [x1, x2 … xN]
X_hat = X - [x_mean, x_mean…]
Covariance matrix can be represented as 1/N * X_hat X_hatT
X_hat = U S VT
- Left singular vectors U gives us the eigenvectors of the covariance matrix
- Singular values let us compute the eigenvalues of the covariance matrix: lambda = 1/N * S_squared
How to choose number of basis vectors in PCA?
- We can experiment and find the number of basis that work the best in our application
- We can say that the sum of all variances of our projections should not be less than 0.9. In other words, we want to capture at least 0.9 variance of the whole data.