Weel 9 - Object Recognition and Categorisation Flashcards
What is Indexing with Local Features
Each patch/region surround a point of interest has a descriptor; some high-dimensional feature space (e.g., SIFT)
Close points in feature space have similar descriptors, indicating similar local content
Important for 3d reconstruction and
Retrieving images of similar objects
(try match target image features to descriptors)
How do we efficiently find the relevant features of a new image
Using the idea of an inverted file index
What is Inverted File Index
for text docs, use an index to find pages where word occurs
We want to find all images(pages) in which a
feature(word) occurs
we want to map our features to ‘visual words’
Visual words: Main idea
Extract some local features from a number of images and map them into the 128-dimension space (if using SIFT)
Each point in the space is a local descriptor (SIFT vector)
How do we match visual words
When we see close points in feature space, we have similar descriptors (similar content)
Content is close enough to assume it is the same
How do we use clusters for visual words
We can create clusters to reduce the complexity (millions of points) to far fewer clusters which are considered the same
“quantize via clustering”
What are the cluster centres
the prototype “words”
How do we create the inverted file for visual words
-database of images
-run sift, find interest points and encode descriptors
-cluster descriptors
-Create our list of visual words (cluster centres)
-we pass all images through the visual words
-for each word, we have the list of images where this visual word occurs
How does the inverted file index handle new images
extract visual words it contains (sift)
map image to relevant words in the index
find all the other images that contain the same words
then compare word counts
(images will have the same similar visual words in common)
What is Spatial Verification
Sometimes, non similar images will have high visual word similarity (eg buildings with lots of similar windows)
spatial verification is used to check the images are actually the same
only some of the matches are mutually consistent
What is the spatial verification strategy
Use the generalised hough transform
Let each matched feature cast a vote on location, scale, orientation of the model object
(uses encoded information about the position, scale, and orientation of each feature match)
Verify parameters with enough votes
What is the Video Google system
eg find all scenes in film where actor is wearing blue tie
1.Collect all words within query region
2.Inverted file index to find relevant frames
3.Compare word counts
4.Spatial verification
What are the issues with visual vocabulary formation
- Sampling strategy: where to extract features eg blobs or corners…?
- Clustering / quantization algorithm
- Unsupervised vs. supervised(external labels or annotations are used to guide the clustering process)
- What corpus provides features (universal vocabulary?)
- Vocabulary size, number of words(too small-> may not capture visual content)
What is a good sampling strategy to find specific, textured objects
Sparse sampling at interest points
What is a good sampling strategy for object categorisation
Dense sampling
What is a good sampling strategy for more image coverage
Multiple complementary interest operators
What are 4 main sampling strategies
Randomly
Multiple interest operators
Dense
Sparse
What are some clustering/quantisation methods
k-means (typical choice)
agglomerative clustering
mean-shift
What is a query region
Eg want to find specific object in images
pull out only the SIFT descriptors whose positions are within the relevant polygon