High Level Vision Flashcards

Question 1

Q

Describe the steps for image classification:

Answer

A

1) In training stage, pass labelled images through classifier to extract features
2) the model produces a prediction for each category
3) calculate the loss between the predictions and the ground truth
4) backpropagate and update the parameters accordingly
5) after many iterations the model will converge
6) in the test stage, the model is fixed. Pass the test image through the model to get the prediction

Question 2

Q

What is binary classification?

Answer

A

Classifying if an image is a specific object or not. Results in “tiger” and “non tiger”

Question 3

Q

What is multiclass classification?

Answer

A

Dataset contains multiple categories. Given an image, the classifier assigns a label to it.

Question 4

Q

What is multi-label classification?

Answer

A

Images contain multiple objects, aim to predict the probability for all the objects the image contains

Question 5

Q

What is hierarchical image classification?

Answer

A

First the classifier predicts the wider category the image contains, then it tries to label it more specifically with subcategories
e.g. Fruit, apple, ladywell apple

Question 6

Q

What is a basic approach to image classification?

Answer

A

given RGB image with 32323 pixels, 10 categories, predict 10 numbers representing probability of category in image.
- Summation of probabilities = 1

Question 7

Q

What does f(x, W) equal?

Answer

A

f(x, W) = (W*x) + b
where x is the image
W is the weights/parameters
b is the bias

Question 8

Q

How would we get the predictions for this: given RGB image with 32323 pixels, 10 categories

Answer

A

f(x, W) = (103027 vector)(13027 vector) + (10 1 vector)

Question 9

Q

How would you calculate the score for an image with 4 pixels and 3 classes:

Answer

A

1) Flatten image into 1D vector (4x1)
2) perform matrix multiplication with weights for each category (3x4)
((3x4) . (4x1) = (3x1)
3) add bias to (3x1) vector to get final prediction for each category

Question 10

Q

How does the process above change if you use multi-layer perception?

Answer

A

For each different category the weights and bias is different. So for 3 categories you would multiply the (4x1) vector with a different (1x4) grid of weights, then add the bias

Question 11

Q

How do we find good values for W and b?

Answer

A

Start with random values then converge to the optimal values of W and b that minimise loss

Question 12

Q

What is a loss function and what methods could we use to calculate loss?

Answer

A

A loss function tells us how accurate the classifier is at predicting the categories.
Large loss indicates a poorly trained classifier

Could use the L1 or L2 loss, or SVM loss, Cross-entropy loss, MSE loss, Softmax loss.

Question 13

Q

What would be the formula for calculating L1 loss:

Answer

A

Calculating the loss over the dataset as the average of loss over images.
1/N* sum of loss for predictions of each individual image

Question 14

Q

How is SVM loss calculated?

Answer

A

takes the max between 0 and the value of the score of the non-actual label minus the score of the actual label + a margin delta (usually 1)

Question 15

Q

If this is the prediction for values for 3 classes are given in the following table. Compute the multiclass SVM loss for each class. Then compute the total loss for all classes. Delta = 1

cat: 3.1 1.5 5.2
dog: 0.7 2.4 1.2
person: 1.5 5.1 -1.4

Answer

A

max(0, 0.7 - 3.1 + 1) + max(0, 1.5 - 3.1 + 1)= 0 + 0 = 0
max(0, 1.5 - 2.4 + 1) + max(0, 5.1 - 2.4 + 1)= 0.1 + 3.7= 3.8
max(0, 5.2- -1.4+ 1) + max(0, 1.2- -1.4 + 1)= 7.6+3.6= 11.2

Average: (0 + 3.8 + 11.2)/3 = 5

Question 16

Q

What is the difference between a deep learning neural network and a simple neural network?

Answer

A

In deep learning there may be hundreds of hidden layers, that are used to train the model and produce the output

Question 17

Q

What is a convolutional neural network?

Answer

A

a type of deep learning model that uses convolutional layers that apply filters to input data to capture image features
given an image and a filter it calculates the output, used for image classification

Question 18

Q

What are recurrent networks?

Answer

A

The output becomes the next input.
- they have connections that form directed cycles.
- allows retention of memory of previous inputs through hidden states

Question 19

Q

How do artificial neural networks work?

Answer

A

neurons receive multiple inputs, which have adjustable weights.
a threshold decides whether or not a neuron is active or not.

Question 20

Q

What is the input signal formula for a neuron?

Answer

A

(sum of (weights*inputs)) + bias

Question 21

Q

What is the output signal formula for a neuron?

Answer

A

y = function(input)

Question 22

Q

What is an activation function?

Describe the threshold activation function:

Answer

A

It determines if a neuron is active or not

choose a threshold, if the weighted sum of inputs + bias meets the threshold, the neuron is active

Question 23

Q

What is the sigmoid function?

Answer

A

It’s an activation function
formula: x = 1/1+e^-x
output is always between 0 and 1

Question 24

Q

What are the two parameters that can be introduced to the sigmoid function?

How does it fit into the formula?

Answer

A

c1: controls slope of sigmoid
c2: controls horizontal offset

1/1+e^-c1*(x-c2)

Question 25

Q

How do different values of x effect the output of the sigmoid function?

Answer

A

for large negative inputs (x), the output approaches 0
for large positive inputs (x), the output approaches 1.
for inputs near 0, the output is around 0.5

Question 26

Q

What are the properties of the sigmoid function?

Answer

A

it’s domain is (−∞, ∞)
it’s range is between and including (0, 1)
when input is 0, output is 1

Question 27

Q

What is the derivative of the sigmoid function?

Answer

A

σ′(x)=σ(x)(1−σ(x))

include derivation process on 2D paper

Question 28

Q

Name some other activation functions:

Answer

A

Leaky Relu: max(0.1x, x)
tanh(x)
Relu max(0, x)

Question 29

Q

What are TLUs?

Answer

A

Technical logic units
- simplified version of threshold neuron model
- they only accept binary inputs and the weights are usually = 1

Question 30

Q

Describe the AND TLU node:

Answer

A

2 inputs are either 0 or 1, weights = 1.
range of outputs is [0,2]
threshold has to be a value > 1 [1,2]
if input*weight = 0, 0, output = 0 (not active as < threshold)
if input*weight = 0, 1 or 1,0, output = 0 (not active as < threshold)
if input*weight = 1, 1, output = 1 (active as >= threshold)

Question 31

Q

Describe the OR TLU node:

Answer

A

2 inputs are either 0 or 1, weights = 1.
range of outputs is [0,2]
threshold has to be a value >= 1 [0,1]
if input*weight = 0, 0, output = 0 (not active as < threshold)
if input*weight = 0, 1 or 1,0, output = 1 (active as >= threshold)
if input*weight = 1, 1, output = 1 (active as >= threshold)

Question 32

Q

Why is it impossible to have a single neuron with an XOR activation function?

Answer

A

Because XOR requires a restricted output region, or it would need two different thresholds. output would have be 0 < 1 < 2

Question 33

Q

How can we create an activation function with a TLU of XOR logic?

Answer

A

Combine multiple TLUs
- XOR Logic = (OR) AND (NAND)
- 3 TLUs combined

OR:
0 or 0 = 0
0 or 1 = 1
1 or 0 = 1
1 or 1 = 1

NAND:
0 and 0 = not(0) = 1
0 and 1 = not(0) = 1
1 and 0 = not(0) = 1
1 and 1 = not(1) = 0

OR AND NAND:
0 and 1 = 0
1 and 1 = 1
1 and 1 = 1
1 and 0 = 0

Question 34

Q

What does a model generalising well mean?

Answer

A

That the model weights/parameters doesn’t become too specific at knowing the features of the training data and is able to perform well on new unseen images as well.

Question 35

Q

What happens if we use a function with too small a degree, in the middle and a very high degree?

Answer

A

linear line (degree 1), small margin between data points
polynomial line (degree 2), we get accuracy and gradient stability
polynomial line (degree 9), we get accuracy but gradient instability (not a good option)

Question 36

Q

What can happen if you train a model on training data too much?

Answer

A

You can achieve 99% accuracy/1% loss, but the model will be overfitted to the training data and not generalise/perform well on the test data

Question 37

Q

What is the top-5 metric?

Answer

A

When calculating accuracy using correctly labelled image/num of images, counting the prediction as correctly labelled if the correct label is in the top 5 predictions, not just the top 1 prediction

Question 38

Q

How does back propagation work?

Answer

A

blaming individual weights for output, identified by calculating the loss
adapting weight and all the weights in following layers that impacted by bad weight
calculate gradient to update the networks weights

Question 39

Q

Describe MSE (Mean Squared Error)

Answer

A

find sum of all the losses (using L2 loss), then average it over the number input images:

MSE = 1/P * sum of (prediction - ground truth)2

Question 40

Q

How do we use the MSE mean squared error minimise loss?

Answer

A

To minimise the MSE we use the gradient descent method.
* Gradient descent finds the absolute minimum of a function.
* It is especially useful for high-dimensional functions.
* It iteratively minimises the neuron’s error by finding the gradient of the error surface in weight-space and adjusting the weights in the opposite direction.