Mid Level Vision Flashcards
What are the 5 steps of Canny edge detection?
1) Gaussian Blurring: reduce noise by applying a Gaussian filter
2) Gradient Calculation: apply sobel filter in horizontal and vertical direction to get first derivative of x and y. Then use this to calculate the gradient magnitude and direction for each pixel
3) Non-Maximum suppression: for each pixel, check if it’s the local maximum along the gradient direction. If it is, retain it as an edge, otherwise suppress it (set it to 0)
4) Double Thresholding: choose two thresholds. Pixels with gradient magnitudes above the higher threshold are strong edges, pixels with gradient magnitudes
between the thresholds are weak edges and pixels with gradient magnitudes below the low threshold are discarded.
5) Edge tracking by hysteresis: only retain weak edges if they’re connected to strong edges.
How do computers find similarities/the same feature in two images?
It finds the similarities by calculating the distances between the points
What does invariance mean?
resistant to variation, regardless of changes in conditions or transformations.
What quality do features have if a computer is able to find the same feature in two images?
The feature should be rotation or translation invariant (if it’s the same feature in two images)
Is it possible to change a histogram back into the original image?
No because histograms don’t store any pixel location information.
- Secondly multiple images can produce the same histogram.
How do we calculate a normalised histogram?
We divide each amount by the number of pixels in the image: e.g. if there are 86 0’s in an 8 bit 8x16 image, the normalised value will be 86/128
How do you create a histogram from a coloured image?
You have to create 3 different histograms, one for each colour RGB- red green blue
What are histograms used for?
They are used as features for image classification or recognition
Which geometric operations are histograms invariant to?
- Rotation
- scaling
- mirroring
What are LBP descriptors and how do they work?
Local binary pattern descriptors describe the surroundings of a pixel.
- They generate a bit-code:
- The threshold is the central pixel: moving in a clockwise direction, each pixel is compared to the threshold.
- If the pixel value is equal to or above the threshold it’s set to 1. If it’s below the threshold it’s set to 0
What is the LBP code of an image?
How many values will this code always hold
It is the vector of 1’s and 0’s created by moving clockwise around a pixel and comparing the pixel value to the central pixels value. Hence this will always hold 8 values.
e.g. = [0 1 1 1 0 1 0 1]
What is the LBP index?
What is the LBP index of the following LBP code? [0, 1, 1, 1, 0, 1, 0, 1]
- The LBP index is calculated by summing the base 10 values of the LBP code:
- [0 1 1 1 0 1 0 1] = 0 + 64 + 32 + 16 + 4 + 1 = 117
What is image classification?
given an input image, assign a label to it from a fixed set of categories
How do we perform image classification?
what is the output?
We extract features from the image and train a classifier to categorise the image.
The output is a vector of the probability for each of the categories that the image contains that category.
What are the challenges of image classification?
- viewpoint variation
- illumination variation
- scale variation
- deformations
- occlusion
- background clutter
- intra-class variation
What is a traditional approach to training an image classifier?
1) find the edges of the image
2) find the corners in the edges
3) classify the object
Describe the steps of the machine learning image classification method?
1) construct a training dataset of images and predefined labels
2) train the classifier to produce predictions of the probability that the object is accurately labelled
3) Evaluate new images that have no predefined labels
What does a training dataset used by a ML image classifier consist of?
A collection of images of each of the objects that there are labels for
What does the image classifier consist of in a ML image classification method?
It is a neural network or a traditional classifier (edges method) can be used
What are the steps of the K-nearest neighbour image classification method?
1) we have a number (e.g. 3) of existing categories and a new point
2) calculate the distance between the new point and all the existing points
3) rank the points by increasing order of distance
4) choose k nearest neighbours and label the new point as that category of which the new points has the most nearest neighbours
How does the 1KK method work?
the classifier takes a test image, compares it to all the training images and predicts the label using the 1 closest training image to find the category.
What are the two types of distance measures?
L1 and L2
Describe how L1 distance is calculated for two images?
L1 is the sum of the absolute values of the difference between the two images pixel values
Describe how L2 distance is calculated for two images?
AKA Elucidean distance: the square root of the sum of the difference between two images pixel values squared
Which L distance do we usually use for KNN?
- L2 distance
Why may varying the value of k change which category it belongs to?
- If k = 1 and it’s nearest neighbour is in class A, it will be assigned in class A
- If k = 3 and it’s nearest neighbours are 1 class A and 2 class B, it will be assigned class B
What is image segmentation?
Given an image we need to predict the label for each pixel.
What are the 3 types of classes?
- Supervised: every image is labelled
- Unsupervised: no images are labelled
- Semi-supervised: some images are labelled
What does weakly supervised mean?
the dataset consists of images that have one label but the pixels aren’t labelled.
e.g. if we give the image the label cat, it knowns that somewhere in the image there is a cat, but it needs to figure out which pixels are the cat
What are some image segmentation problems?
Over and under segmentation can occur
name a traditional segmentation method:
Binary image segmentation
Describe how binary image segmentation works:
1) Choose a threshold
2) if pixel value is above the threshold set it to 1, if it’s below the threshold set it to 0
3) This gives us two segments
4) we can adapt/experiment with threshold values to find the best threshold
How can we find the best threshold?
- We can create a histogram based on the threshold values and then set the threshold to be between the two highest peaks
What are some limitations of binary image segmentation?
- If you choose a bad threshold the result will be bad
- it doesn’t take into account spatial information
- doesn’t work accurately on more complex detailed images
What is region based image segmentation?
- Describe two methods of doing it:
- it takes into account the location of a pixel as well as it’s value
- given the centre point, it has 4 neighbours (above, below, right and left)
- given the centre point, it has 8 neighbours (all surrounding pixels)
- it then performs region growing in order to create the segmentations
Describe how region growing is used for segmentation using region segmentation:
3 main steps
1) start with one pixel chosen arbitrarily, given the starting pixel a label, examine all of it’s unlabeled neighbours, if they are within the similarity threshold, give them the same label
2) repeat this until the region stops growing, then choose another starting pixel and repeat the process
3) do the steps above until all pixels have been assigned a region.
How is clustering segmentation done?
By minimising the distance between intra-cluster distances
and maximising the distance between inter-cluster distances
Describe the steps of K-means clustering
1) randomly choose k points to act as cluster centres
2) allocate the other points to the closest cluster centre
3) compute new cluster centres as the mean position of the elements in each cluster
4) keep redoing the process from step 2, until the centers are unchanged in step 3
What are the limitations of k means clustering?
- we need to choose k
- if nature of the data is strange, the k means clustering technique will not provide good results
What are local descriptors?
- They describe features within specific regions in an image.
- They capture distinctive patterns around key points like edges, corners or blob
Give 3 examples of local descriptors:
- SIFT
- SURF
- ORB
Describe the SIFT local descriptor:
it detects and describes local key points, invariant to scale and rotation changes
Describe the SURF local descriptor:
A faster alternative to SIFT, used to detect and describe points of interest in images
Describe the ORB local descriptor:
A combination of the FAST keypoint detector and BRIEF descriptor, designed for real-time application
What is a feature?
A distinctive attribute or aspect of something
What is a descriptor?
- A numerical representation that captures essential features in an image
- summarises key visual information for algorithms to identify key similarities/differences between images
What are descriptors often designed to be?
Invariant to changes in rotation, scaling and lighting, ensuring robustness in matching images under different conditions
How can a computer check if two images are the same for two images that are the same features but from different perspectives?
Computers can only see numbers, so given a local 3x3 region, how can we find a feature in another image?
- Can try searching for the same region of numbers
or - try looking for areas that have the different pixel values the same differences between neighbouring pixels
What are the 5 things we want invariance to?
- illumination
- scaling
- rotation
- translation
- perspective projection
How can we make an algorithm robust to with illumination changes in images?
- Could extract the edges or
- Normalise the pixel values by calculating the average of the image and subtracting this from each pixel value
How can we make an algorithm robust to with scale changes in images?
We can rescale the image into different scales and see if they have the same features
How can we make an algorithm robust to with rotation changes in images?
We can rotate the image and see if the two images have the same features
How can we make an algorithm robust to with perspective changes in images?
We can perform a combination of rotation and scale changes and compare to see if the features are the same
How do we represent descriptor?
We focus on a small subregion of the image and represent it as a vector
What are LBP descriptors?
Descriptors calculated by finding the LBP index of a region.
What is a simple normalised descriptor?
Given a region subtract the centre point to get the absolute value (including itself) to get normalised pixel values
get the simple normalised descriptor for the following region with point of interest 201:
45 46 200
46 201 200
85 101 105
156 145 1
155 0 1
116 100 96
= [156 145 1 155 0 1 116 100 96]
What properties does the simple normalised descriptor have?
- It’s translation invariant
- it’s somewhat illumination invariant
How can we design a descriptor to be rotation invariant?
- Rotate the region to it’s defined dominant orientation
- rotate the feature to the same face to see if it’s the same descriptor
How can we design a descriptor to be translation invariant?
calculate histogram of oriented gradients
- histograms are invariant to translation
What can we use to match two regions where one has been translated?
Use K nearest neighbour to match the same region that’s been translated
What does SIFT stand for, describe it:
Scale Invariant Feature Transform:
- invariant to rotation and scaling
- partially invariant to illumination and projections
What are the 4 steps to extract SIFT features?
1) Determine approximate location and scale of keypoints
2) Refine keypoints (by rejecting edges, low contrast and noisy points)
3) Determine orientation(s) for each keypoint
4) Determine descriptor for each keypoint
Describe in words how we achieve the scale invariant when computing SIFT features?
By detecting features in a scale-space. Identifying features that are consistent across different scales.
What are the advantages of SIFT
- robust to scale and rotation invariant
- somewhat robust to illumination
- generates highly distinctive descriptors
What are the disadvantages of SIFT
- Not as computationally efficient or fast as SURF or ORB
- struggles with significant illumination and perspective changes
What are octaves:
using a different gaussian scale for convolution with each image
Why do we use the difference of Gaussians to approximate the LoG?
The LoG is complex to compute so DoG is used as it provides a close approximation but is simpler to compute.
How do we detect keypoints of feature maps?
By identifying local extrema (minima or maxima) in the 3D DoG space.
How do we identify if a keypoint is a local extrema (minima or maxima) in the 3D DoG space.
We compare the pixel value to it’s 26 (9+9+8) neighbours, it’s a minima if it’s less than all of it’s neighbours values and a maxima if it’s greater than all of it’s neighbours values
How can we detect noisy keypoints to remove?
We define parameters, if the absolute value of the keypoint is above a certain threshold, keep it, if it’s equal or below remove it.
What is the main orientation and why may there be a second main orientation?
When we create the histogram of orientations by summing up the magnitudes for each orientation, the main orientation is the orientation with the largest magnitude value.
- if there’s a magnitude larger than 80*main orientation’s magnitude then it’s counted as the second main orientation