Weel 9 - Object Recognition and Categorisation Flashcards

1
Q

What is Indexing with Local Features

A

Each patch/region surround a point of interest has a descriptor; some high-dimensional feature space (e.g., SIFT)
Close points in feature space have similar descriptors, indicating similar local content
Important for 3d reconstruction and
Retrieving images of similar objects
(try match target image features to descriptors)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How do we efficiently find the relevant features of a new image

A

Using the idea of an inverted file index

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Inverted File Index

A

for text docs, use an index to find pages where word occurs
We want to find all images(pages) in which a
feature(word) occurs
we want to map our features to ‘visual words’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Visual words: Main idea

A

Extract some local features from a number of images and map them into the 128-dimension space (if using SIFT)
Each point in the space is a local descriptor (SIFT vector)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do we match visual words

A

When we see close points in feature space, we have similar descriptors (similar content)
Content is close enough to assume it is the same

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How do we use clusters for visual words

A

We can create clusters to reduce the complexity (millions of points) to far fewer clusters which are considered the same
“quantize via clustering”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are the cluster centres

A

the prototype “words”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do we create the inverted file for visual words

A

-database of images
-run sift, find interest points and encode descriptors
-cluster descriptors
-Create our list of visual words (cluster centres)
-we pass all images through the visual words
-for each word, we have the list of images where this visual word occurs

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

How does the inverted file index handle new images

A

extract visual words it contains (sift)
map image to relevant words in the index
find all the other images that contain the same words
then compare word counts
(images will have the same similar visual words in common)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is Spatial Verification

A

Sometimes, non similar images will have high visual word similarity (eg buildings with lots of similar windows)

spatial verification is used to check the images are actually the same
only some of the matches are mutually consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the spatial verification strategy

A

Use the generalised hough transform
Let each matched feature cast a vote on location, scale, orientation of the model object
(uses encoded information about the position, scale, and orientation of each feature match)
Verify parameters with enough votes

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the Video Google system

A

eg find all scenes in film where actor is wearing blue tie
1.Collect all words within query region
2.Inverted file index to find relevant frames
3.Compare word counts
4.Spatial verification

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the issues with visual vocabulary formation

A
  • Sampling strategy: where to extract features eg blobs or corners…?
  • Clustering / quantization algorithm
  • Unsupervised vs. supervised(external labels or annotations are used to guide the clustering process)
  • What corpus provides features (universal vocabulary?)
  • Vocabulary size, number of words(too small-> may not capture visual content)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is a good sampling strategy to find specific, textured objects

A

Sparse sampling at interest points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is a good sampling strategy for object categorisation

A

Dense sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a good sampling strategy for more image coverage

A

Multiple complementary interest operators

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are 4 main sampling strategies

A

Randomly
Multiple interest operators
Dense
Sparse

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are some clustering/quantisation methods

A

k-means (typical choice)
agglomerative clustering
mean-shift

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What is a query region

A

Eg want to find specific object in images
pull out only the SIFT descriptors whose positions are within the relevant polygon

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What is object categorisation

A

Instead of recognising a specific dress
how can we recognise any dress
(a category)

21
Q

What is the task description of object categorisation

A

Given a small number of training images of a category, recognize a-priori unknown instances of that category and assign the correct category label

22
Q

What are Visual object categories

A

humans tend to find ‘basic-level categories’
EG
abstract level = animal, mammal
basic level = dog, cat, cow
individual level = dobermann, “Gary”

23
Q

What is a-priori unknown

A

not known or seen by the system during the training phase

24
Q

What is a functional category

A

“Something i can sit on”
“something i can eat”

25
Q

what is an ad-hoc category

A

“something you can find in an office environment”

26
Q

What are challenges to robustness of object categorisation

A

illumination
object pose
clutter
occlusions
intra-class appearance
viewpoint

27
Q

what is the scale of supervision we can use

A

less: unlabeled, multiple objects
medium: classes labelled, some clutter
more: cropped to object, parts within object are labelled

28
Q

What features must our visual word representation have for object categorisation

A

robust to intra-category variation
robust to deformation, articulation

still discriminative

29
Q

What is the loose v strict BoW definition

A

Looser: independent features
Stricter: independent features with histogram representation (how frequently each word appears in image)

30
Q

BoW overview

A
  1. feature detection and representation (SIFT)
  2. create codewords dictionary (visual word index)
  3. Image representation via bag of codewords
31
Q

BoW: Feature detection and representation

A

creating regular grid:
- eg histogram
Interest point detector:
- Use state-of-the-art interest point detector
- represent features using SIFT

32
Q

BoW: Image representation

A

for each image, we have frequency histogram
with each visual word and the count of how many times it appears

33
Q

How do we compare BoWs

A

We can use many methods with histogram frequency data
Eg euclidean distance, normalised scalar product

34
Q

Recognition with BoW histograms

A

BoWs representation means we can describe
the unordered set of points with a single vector (of fixed dimension across image examples)
this provides an easy way to use distribution of feature types

35
Q

what are the two method types for recognition

A

generative methods
discriminative methods

36
Q

What is a discriminative method for recognition

A

Learn a decision boundary/ rule (classifier) assigning bag of features representation of images to difference classes
Zebra/non-zebra

37
Q

What is a generative method for recognition

A

Use probability
p(image|zebra)
p(image|no zebra)
look at likelihood values

38
Q

Example discriminative: knn classification

A

map histogram in graph space, with boundaries between images
when we have a new image, build histogram which maps to a point in the graph
find the k nearest histograms
if the average says (positive eg yes zebra) then positive (negative -> negative)

39
Q

nearest neighbour classification pros

A

Simple to implement
Flexible to feature / distance choices
Naturally handles multi-class cases
Can do well in practice with enough representative data

40
Q

nearest neighbour classification cons

A

Large search problem to find nearest neighbors
Storage of data
Must know we have a meaningful distance function

41
Q

what are some other types of discriminative classifiers

A

boosting
SVMs

42
Q

Example generative: the naive bayes model

A

Assume that each feature is conditionally independent
p(w1,…,w|c) = ∏p(wi|c)
want to maximise:
c* = argmaxc p(c)∏p(wi|c)
if we know nothing about data assume uniform prior p(c)

43
Q

What is p(c) in naive bayes

A

Prior prob. of the object classes

44
Q

What is p(wi|c) in naive bayes

A

Likelihood of i-th visual word given the class
Estimated by empirical frequencies of visual words in images from a given class

45
Q

what is ∏

A

multiply them together

46
Q
A
47
Q

How can we improve spatial information of BoW model

A

-visual phrases: frequently co-occurring words
-semi-local features: describe configuration, neighbourhood
-let position be part of each feature

48
Q

BoW pros

A
  • Flexible to geometry / deformations / viewpoint
  • Compact summary of image content
  • Provides vector representation for sets
  • Empirically good recognition results in practice
49
Q

BoW cons

A
  • Basic model ignores geometry – must verify afterwards, or encode via features.
  • Background and foreground mixed when bag covers whole image
  • Interest points or sampling: no guarantee to capture object-level parts.
  • Optimal vocabulary formation remains unclear.