Mid Level Vision Flashcards

Question

Which L distance do we usually use for KNN?

Answer 1

- L2 distance

Answer 2

- If k = 1 and it's nearest neighbour is in class A, it will be assigned in class A - If k = 3 and it's nearest neighbours are 1 class A and 2 class B, it will be assigned class B

Answer 3

Given an image we need to predict the label for each pixel.

Answer 4

- Supervised: every image is labelled - Unsupervised: no images are labelled - Semi-supervised: some images are labelled

Answer 5

the dataset consists of images that have one label but the pixels aren't labelled. e.g. if we give the image the label cat, it knowns that somewhere in the image there is a cat, but it needs to figure out which pixels are the cat

Answer 6

Over and under segmentation can occur

Answer 7

Binary image segmentation

Answer 8

1) Choose a threshold 2) if pixel value is above the threshold set it to 1, if it's below the threshold set it to 0 3) This gives us two segments 4) we can adapt/experiment with threshold values to find the best threshold

Answer 9

- We can create a histogram based on the threshold values and then set the threshold to be between the two highest peaks

Answer 10

- If you choose a bad threshold the result will be bad - it doesn't take into account spatial information - doesn't work accurately on more complex detailed images

Answer 11

- it takes into account the location of a pixel as well as it's value - given the centre point, it has 4 neighbours (above, below, right and left) - given the centre point, it has 8 neighbours (all surrounding pixels) - it then performs region growing in order to create the segmentations

Answer 12

1) start with one pixel chosen arbitrarily, given the starting pixel a label, examine all of it's unlabeled neighbours, if they are within the similarity threshold, give them the same label 2) repeat this until the region stops growing, then choose another starting pixel and repeat the process 3) do the steps above until all pixels have been assigned a region.

Answer 13

By minimising the distance between intra-cluster distances and maximising the distance between inter-cluster distances

Answer 14

1) randomly choose k points to act as cluster centres 2) allocate the other points to the closest cluster centre 3) compute new cluster centres as the mean position of the elements in each cluster 4) keep redoing the process from step 2, until the centers are unchanged in step 3

Answer 15

- we need to choose k - if nature of the data is strange, the k means clustering technique will not provide good results

Answer 16

- They describe features within specific regions in an image. - They capture distinctive patterns around key points like edges, corners or blob

Answer 17

- SIFT - SURF - ORB

Answer 18

it detects and describes local key points, invariant to scale and rotation changes

Answer 19

A faster alternative to SIFT, used to detect and describe points of interest in images

Answer 20

A combination of the FAST keypoint detector and BRIEF descriptor, designed for real-time application

Answer 21

A distinctive attribute or aspect of something

Answer 22

- A numerical representation that captures essential features in an image - summarises key visual information for algorithms to identify key similarities/differences between images

Answer 23

Invariant to changes in rotation, scaling and lighting, ensuring robustness in matching images under different conditions

Answer 24

- Can try searching for the same region of numbers or - try looking for areas that have the different pixel values the same differences between neighbouring pixels

Answer 25

- illumination - scaling - rotation - translation - perspective projection

Answer 26

- Could extract the edges or - Normalise the pixel values by calculating the average of the image and subtracting this from each pixel value

Answer 27

We can rescale the image into different scales and see if they have the same features

Answer 28

We can rotate the image and see if the two images have the same features

Answer 29

We can perform a combination of rotation and scale changes and compare to see if the features are the same

Answer 30

We focus on a small subregion of the image and represent it as a vector

Answer 31

Descriptors calculated by finding the LBP index of a region.

Answer 32

Given a region subtract the centre point to get the absolute value (including itself) to get normalised pixel values

Answer 33

156 145 1 155 0 1 116 100 96 = [156 145 1 155 0 1 116 100 96]

Answer 34

- It's translation invariant - it's somewhat illumination invariant

Answer 35

- Rotate the region to it's defined dominant orientation - rotate the feature to the same face to see if it's the same descriptor

Answer 36

calculate histogram of oriented gradients - histograms are invariant to translation

Answer 37

Use K nearest neighbour to match the same region that's been translated

Answer 38

Scale Invariant Feature Transform: - invariant to rotation and scaling - partially invariant to illumination and projections

Answer 39

1) Determine approximate location and scale of keypoints 2) Refine keypoints (by rejecting edges, low contrast and noisy points) 3) Determine orientation(s) for each keypoint 4) Determine descriptor for each keypoint

Answer 40

By detecting features in a scale-space. Identifying features that are consistent across different scales.

Answer 41

- robust to scale and rotation invariant - somewhat robust to illumination - generates highly distinctive descriptors

Answer 42

- Not as computationally efficient or fast as SURF or ORB - struggles with significant illumination and perspective changes

Answer 43

using a different gaussian scale for convolution with each image

Answer 44

The LoG is complex to compute so DoG is used as it provides a close approximation but is simpler to compute.

Answer 45

By identifying local extrema (minima or maxima) in the 3D DoG space.

Answer 46

We compare the pixel value to it's 26 (9+9+8) neighbours, it's a minima if it's less than all of it's neighbours values and a maxima if it's greater than all of it's neighbours values

Answer 47

We define parameters, if the absolute value of the keypoint is above a certain threshold, keep it, if it's equal or below remove it.

Answer 48

When we create the histogram of orientations by summing up the magnitudes for each orientation, the main orientation is the orientation with the largest magnitude value. - if there's a magnitude larger than 80*main orientation's magnitude then it's counted as the second main orientation