Object Detection - Week 9 Flashcards
What are the advantages of local features
Critical to find distinctive and repeatable local regions for multi-view matching
Complexity reduction via selection of distinctive points
Describe images, objects, parts without requiring segmentation; robustness to clutter & occlusion
Robustness - Similar descriptors in spite of moderate view changes, noise, blur, etc…
What does it mean when two feature descriptors are close in feature space
The two features have similar local content
What is the idea behind visual words?
Extract local features from a number of images, e.g. a sift descriptor, which can be represented as points
Map high-dimensional descriptors to tokens/words by quantising the feature space. Can quantise via clustering, let cluster centres be the prototype “words”
Determine which word to assign to each new image region by letting cluster centres by the prototype “words”
Determine which words to assign each new image region by finding the closest cluster centre
How do inverted file indexes work?
Detect words in images, an inverted index is a dictionary where the key is the word number, and the values are the images that have the key word in them
New query images are mapped to indicies of database images that share a word. images already in the index are selected based on which ones have the highest word matches with the query image
What is spatial verification?
Can use generalised hough transform:
- Let each matches feature case a vote on location, scale, orientation of the model object
- Verify parameters with enough votes
What are the steps of the video google system?
- Collect all words within query region
- Inverted file index to find relevant frames
- Compare word counts
- Spatial verification
What sampling strategies exist for visual vocabulary formation?
Sparse, at interest points
- Better to find specific, textured objects
Dense, uniformly sampled
- For object categorisation this is better
Randomly sampled
Multiple interest operators
What is the typical clustering method for visual words?
K-means clustering
Also used: agglomerative clustering, mean-shift
How are words collected in a query region?
Pull out only the SIFT descriptors whose positions are within the polygon
What is object categorisation?
Find this particular object
Recognise any car
Recognise any cow
Given a small number of training images of a category recognise a-priori unknown instances of that category and assign the correct category label
What is evidence for how humans categorise?
Evidence that humans (usually) start with basic-level categorisation before doing identification
- Easier and faster for humans to do basic-level categorisation than object identification
- Most promising starting point for visual classification
How many object categories are there?
~10,000 to 30,000
What types of categories are there?
Functional categories
- Chairs = “something you can sit on”
- Ad-hoc = “something you can find in an office environment”
What are the challenges for object categorisation?
Robustness
Illumination
object pose
clutter
occlusions
intra-class appearance
viewpoint
What is the idea of bag of words?
Represent whole images as a bag of it’s features, “independent features”
Stricter definition
Independent features
Histogram representation
- x-axis is the features, y-axis is how many times that feature appears