Object Detection - Week 9 Flashcards

Question 1

Q

What are the advantages of local features

Answer

A

Critical to find distinctive and repeatable local regions for multi-view matching

Complexity reduction via selection of distinctive points

Describe images, objects, parts without requiring segmentation; robustness to clutter & occlusion

Robustness - Similar descriptors in spite of moderate view changes, noise, blur, etc…

Question 2

Q

What does it mean when two feature descriptors are close in feature space

Answer

A

The two features have similar local content

Question 3

Q

What is the idea behind visual words?

Answer

A

Extract local features from a number of images, e.g. a sift descriptor, which can be represented as points

Map high-dimensional descriptors to tokens/words by quantising the feature space. Can quantise via clustering, let cluster centres be the prototype “words”

Determine which word to assign to each new image region by letting cluster centres by the prototype “words”

Determine which words to assign each new image region by finding the closest cluster centre

Question 4

Q

How do inverted file indexes work?

Answer

A

Detect words in images, an inverted index is a dictionary where the key is the word number, and the values are the images that have the key word in them

New query images are mapped to indicies of database images that share a word. images already in the index are selected based on which ones have the highest word matches with the query image

Question 5

Q

What is spatial verification?

Answer

A

Can use generalised hough transform:
- Let each matches feature case a vote on location, scale, orientation of the model object
- Verify parameters with enough votes

Question 6

Q

What are the steps of the video google system?

Answer

A

Collect all words within query region
Inverted file index to find relevant frames
Compare word counts
Spatial verification

Question 7

Q

What sampling strategies exist for visual vocabulary formation?

Answer

A

Sparse, at interest points
- Better to find specific, textured objects

Dense, uniformly sampled
- For object categorisation this is better

Randomly sampled
Multiple interest operators

Question 8

Q

What is the typical clustering method for visual words?

Answer

A

K-means clustering

Also used: agglomerative clustering, mean-shift

Question 9

Q

How are words collected in a query region?

Answer

A

Pull out only the SIFT descriptors whose positions are within the polygon

Question 10

Q

What is object categorisation?

Answer

A

Find this particular object
Recognise any car
Recognise any cow

Given a small number of training images of a category recognise a-priori unknown instances of that category and assign the correct category label

Question 11

Q

What is evidence for how humans categorise?

Answer

A

Evidence that humans (usually) start with basic-level categorisation before doing identification
- Easier and faster for humans to do basic-level categorisation than object identification
- Most promising starting point for visual classification

Question 12

Q

How many object categories are there?

Answer

A

~10,000 to 30,000

Question 13

Q

What types of categories are there?

Answer

A

Functional categories
- Chairs = “something you can sit on”
- Ad-hoc = “something you can find in an office environment”

Question 14

Q

What are the challenges for object categorisation?

Answer

A

Robustness
Illumination
object pose
clutter
occlusions
intra-class appearance
viewpoint

Question 15

Q

What is the idea of bag of words?

Answer

A

Represent whole images as a bag of it’s features, “independent features”

Stricter definition
Independent features
Histogram representation
- x-axis is the features, y-axis is how many times that feature appears

Question 16

Q

What are the steps for bag of words learning?

Answer

Study These Flashcards

A

Feature detection & representation
Codewords dictionary
Image representation

Object Detection - Week 9 Flashcards

(16 cards)