Computer Vision Flashcards

1
Q

What is representation learning of CNN

A

The Network not only predicts the classification but also learns how the image is composed, all the interconnection that are in the image in order to make is an object

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Self supervised learning, what is it?

A

Still need to fully get it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is perception?

A

Ability to capture and process information from our senses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the perceptron

A

Simple model that defines a linear boundry 0 or 1 ( sign(xw)). It does not work if the observation are not linearly separable. You can see the perceptron as one neuron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Learning algorithm

A
Is an algorithm that is able to learn from data. 
Ingredients:
- Task
- Performance Measures
- Experience
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Unsupervised Learning

A

Find pattern in data without really giving them any direction (no expert input). Mainly used for clustering

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Reinforcment learning

A

There is an agent that is performing a certain task, in order to learn a specific task there is a feedback loop that gives rewards to the agent as soon that it makes a correct action. Used a lot for games

The agent is in a particular state and is performing a certain action, this action will bring the agent to be in state + 1. This move is associated to a positive, negative or neutral feedback

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

List all the type of learnings

A
Active Learning
Online Learning and Incremental Learning
Weak supervised Learning
Self-supervised Learning
Deep Learning
Federated Learning
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is inductive bias

A

Inductive Bias is the bias that is introduced by the hipothesi selection (meanig which assumption I am using in the modeling phase)

You have two types of Bias

  • Restriction (lmit hipothesis space)
  • Preference: Impose ordering on hypothesis (priorities)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Bias and Variance trade-off

A

When you are modeling you risk to overfit or underfit, meaning introducing a lot of bias or a lot of variance in your predictions.
Bias means that your assumption are too strict and you are not able to fully explain the phenomenon.
Variance means that you ar e explaining not only the phenomenon but also the noise such as measurement error that do not help when generalizing the results.

If you have high variance you will see strong performances in the training set and bed ones in the test set. High Bias you see bad performances in both the data-sets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Algorithmic Bias

A

An algorithm that creates unfair outcomes it has a Bias. A system that is acheiving better result for an ethnic group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Cost Function

A

Is your tool to measure performances and feed these information back into the model. Your goal is to minimize it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Gradient Descent

A

Is a successful algorithm that allows you to estimate the parameters.

You are basically descending the loss functions towards low value of it. You use the gradient to inform it.

Update the weights towards the negative of the gradient multiplied by a learning rate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Challenges connected to Vision

A
  • Illumination
  • Shadow
  • Scale
  • Perspective view
  • Viewpoint
  • Deformation
  • Occlusion
  • Clutter
  • For Classification (intra-class variation or inter class variation)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

image representation

A

Binary: Black and white image can be represented as 0 and 1 matrix

Greyscale: values from 0 to 255

Color Images: 3 layers with values betwee n 0 and 255 (blue, green and red)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is color constancy?

A

Some color might look different if they are close to other colors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What is the issue due sampling?

A

Is basically when you have an image where the border are not well defined, in order to decrease this problem you can increase the resolution (dpi)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Quantized?

A

Image are mapped into a matrix with values for each pixel

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Image instogram

A

Is a nice way of visualizing an image as an intogram. Since every pixel is a value between 0 and 255 you can visualize thir distribution. Can be done for Greyscale and color image.

It is not really useful for image comparison

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

What does it mean that you can see the image as a function?

A

You can see an image as a matrix of values, and x the rows and y the column, in such a way that the function f(x,y) map these two coordinates to a particular value between 0 and 255

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is a filter?

A

It is used in order to transform the pixels from an image. It is used to extract info or transform the images to simplify it or add info. It is basically a function that defines how to process teh pixels

Use case:

  • Extract Info
  • Detect patterns
  • De-noising

Filters are typically Convolution filters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Smoothing filter

A

Smothing filters are filters that are applied to the images to remove sharp feature or it.

They need to have some propreties:

  • They need to have positive values
  • They need to sum up to one
  • Amount of smoothing depends on the kernel size
  • Remove high frequencies (remove the dependencies with closeby region with a lot of black for instance)

Moving average = kXk filter with the sum that goes to 1 that is going to be applied in a region of the image. It replace each filter as the neighbour average. It has the goal of removing sharp feature

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Boundry issue

A

Every time that you apply the filter you do not get the same shape out if you are not using a padding

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Gaussian Filter

A

The Gaussian filter is a type of convolution filter where the pixel are weighte depending on the distance from the filter center.

There are two parameters that rules the filter and are:

  • Size of the filter
  • Scale of the gaussian ditstribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What is a convolution

A

Given an n-dimentional object a convolution is the function that applyis a certain kernel sequencially on each reagion of the input.

To call it convolution you need to take the kernel and then flip it, vertically and then horizontally. Otherwise you are looking at cross correlation. It all depends on the casuality of the signal, what depends on what.

It is used for signal processing

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Propreties of convolution

A
  • Commutative: fg = gf
  • Associative: (fg)h = f(gh)
  • Homogeneity: k fg = fk g = k(f*g)
  • Distributive: f(g+h) = fg + f*h
  • Shift invariant: You can shift the image but the utput will behave the same (the inputs are the same just shifted)
  • Separability: You can separate a 2d filter into 2 1d filters (using matrix multiplication)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Sharpening filter

A

You can use convolution where you use two different filters (explane)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

How can you use filters?

A

Extract informations
Detect patterns
De-noising

29
Q

Definition of Edge

A

It is a sudden change in the image (discontinuity). The shape informations can be encoded in the edge

It is important to define edges because we are able to recognize objects in an image and as well recover geometry of objects

The edges in an image are created by:

  • Different illumination
  • Change in orientation of the surface
  • Different depth
  • Color
30
Q

How can you spot edges

A

since edges are a discontinity and a rapid change in the image you are able to spot edges by looking at the image derivative (since we are able to visualize an image as a function)

The edge is indeed in proximity of the extreme value of the derivative function

For images you can implement derivatives by applying filters (since it is not a continuous function)

31
Q

Describe forward, backward and central derivative for one D

A

Backward: Looks at the changes at position i and i -1
Forward: i and i+1
central: i-1 and i+1

32
Q

What is the image gradient

A

Is the calculation of the gradient on the image, and of course is composed by two quantity, the gradient moving from left to right and the one moving from up and down

The gradient vector points in the direction of the most rapid increase, and the stength of the gradient is defined as the sum of the two derivatives squared all square rooted

You can use the gradient to do image editing and smooth out strong edges

33
Q

What is the intensity profile

A

Given a row or a column of the image you are able to see the pixel intensity and as well the derivative. In this way you are able to get a signal

34
Q

What is the effect of noise on the intensity profile

A

So adding noise to the image also adds noise to the signal, because if every pixel is very different from the others you might think that you have a lot of edges while you actually have only a few

A derivative of random noise is also a random noise

35
Q

How can you overcome the effect of noise in the intensity profile

A

You can use a smoothing filter to force close pixels to be simlar. Of course this fails if you actually have a lot of pixels

For instance you can use a Gaussian Filter

We can also directly multiply the image by the derivative of the filter to find where the peak of the derivative is (is a proprety of the derivatives). This allows us to take one least step

36
Q

Derivative of Gaussian filters

A

Also for the Gaussian filter you have a x directional gaussian filter and a y directional gaussian filter

You can espress the filter as the product of two functions (gaussian with mean 0)

The scale is the parameter of the gaussian kernel that manage how much smoothing there will be (the larger the sigma and the more influence far away pixels will have)

The larger the sigma and the more blurried the edges will be and you focus on macro features

Depending on the type of task we should choose a particular sigma

Really important to remember is that the sum of the weights needs to be 0, constant regions should have value 0

37
Q

Sobel filter

A

The sobel filter is a derivative filter with some smoothing to it.

It comes from the multiplication of a gaussian smoothing filter of one dimention times the 1d derivative

It leads to vertical edges detection

38
Q

Tell me which ones are the derivative filters

A
  • Sobel
  • Scharr
  • Prewitt
  • Roberts
39
Q

Definition of corner

A

Is a region where there is a significant changes in all its direction. Otherwise it would be an edge.

Corners could be thought as keypoints because they identify a certain geometry of the object

40
Q

How can you calculate if there is a corner

A

You can search for the simultaneous changes of the image when you set a certan u and v

41
Q

Why do we need keypoint

A

Keyponts are reference point of the image that should have certain characteristics:

  • Repeatability = Can be found even after geometric transformation
  • Salience = Distinct from each other
  • Compactness and efficiency = Few keypoints
  • Locality = It occupy a small area of the picture.

Keypoints allow to metch images

42
Q

Harris Corner

A

Harris corner detection is an algorithm that detect corner

  • It is Translation invariant
  • Rotation invariant
  • Not scale invariant

The problem is that is not scale invariant, we need a scale invariant algorithm becuase we would like to be able to get universal feature of an image

43
Q

Scale invariant selection

A

Based on two image you need to find regions of similar size that will be in both pictures that have very different scales.

The problem is that is difficult to automatically define the size of the area, because the size will depend on the image

We need to find functions that are scale invariant per se, functions that will always give a similar results independently from the image scale and we apply this function to a region that has similar size

A good function has one sharp peak in the region

In order to do that we can use the Second derivative of a Gaussian

44
Q

Blob detection

A

Blob detection tries to detect keypoints by appplying a second order derivative of a gaussian to an image at multiple scale. We look at different extrema.

These filters are invariant to scale and rotation

45
Q

Blob Kernels

A
  • Laplacian
  • Difference of gaussians

Both are scale and rotation invariant

46
Q

Harris Laplacian

A

Is a scale invariant detector that uses the Harris corner to detect keypoints in space and it uses the Laplacian to account for scale differences

47
Q

Sift

A

Is another Scale invariant that uses the DoG in space and in scale. The difference of gaussians is more efficients than the laplacian because it does not need ro calculate second order derivatives.

You choose the keypoint as the extrema between scale and space, meaning that across all the different region scales and different spaces you choose the maximum

48
Q

Rotational ambiguity

A

If you need to match two pictures you might end up having the images rotated, and they are difficult to match

That is why you need invariant local features

49
Q

Feature descriptor desired property

A

They need to be invariant by:

  • Translation
  • Rotation
  • Scale
  • Change in brightness
  • Perspective
50
Q

SIFT descriptors

A

You can overcome the rotation ambiguity by creating a histogram of local gradients directions. And you choose the most prominent direction. We describe all features in the patch relative to that orientation

For each sift descriptor you have 128 dimentional vectors

You can metch sift descriptor to check keypoint similarity (you can do it with eucledian distance)

51
Q

What is visual recognition

A

It means identify the content of the image. In order to do this you can use a data driven approach (or also some more rule driven approach) like KNN (O(1) at training time, O(n) at prediction)

52
Q

How do you represent images for image classification

A
  • You can represent them as raw pixels

- Bag of words

53
Q

Define bag of words

A

You do not use raw pixels anymore but you count the number of occurrencies of visual preimitives or visual keys. It originated from text categorization.

There are 3 steps:

  • Feature detection (extract local features from an image) -> (DOG and HArris Laplacian or you could randomly sample)
  • Create an mage codebook that is a group of visual primitive.
  • For each feature check the distance with each visual item in the dictionary and build an histogram
  • Use a learning algorithm to define which image it is based on the histogram

So basically you are representing the image as frequencies of visual words

This method have been really useful for:

  • Image classification
  • Large image search
  • Discover visual theme
54
Q

How do you learn the visual vocabulary

A

You take the diverse local features and you cluster them to create feature clusters. The centroid of the cluster will be your visual word (because they summarize a certain feature).

You can usethe k-means:

  • randomly assign the centroid
  • find the closest features
  • re-evaluate the centroid

You can learn the codeword on separate training sets. Indeed this codewords should be universal

You do not want to have a too small vocabolary but not even too large.

In order to make the vocabolary really efficient you can make vocabolary trees (also the comparison is faster)

55
Q

What is a vector quantizer

A

Is a function that takes a feature and maps it to the closest codevector

56
Q

What is teh TF-IDF normalization

A

The TF-IDF normalization is a way to add more weight to features that are not always there. log(n_docu/n_docu_with_feature)

57
Q

What is a problem of Bag of words and how is solved

A

With bag of words you lose the spacial information, the relation that exists between two close visual features.

In order to solve it you can use the spacial pyramid, here are the steps:

  • Take the image and do the bag of words
  • divide the image in 4 and do the bag of words on each subimage
  • Keep dividing until you reach the satisfactory level

It is really good for image representation

58
Q

Why is Deep learning so good

A

Because it can automatically both learn the representation of the image and classify at the same time

59
Q

Which layer do we have in a CNN

A
  • Fully connected
  • Convolutional Layer
  • Pooling Layer

Basically it is a series of convolutional layers an their activation functions

60
Q

Which one is the dimention side after a convolution step

A
(W - F + 2*P)/S + 1 
W input volume size
F filter size
P is the padding
S is the stride
61
Q

What is the pooling layer

A

The Pooling layer does not need any type of learning, you simply pool different values to one (downsamppling the image)

Max pool
Mean Pool

62
Q

Describe different common architecture

A

AlexNet =Pretty fast and memory efficient
VGGNet = It is famous because it showed the effect of having a deeper network, but it is really expensive to evaluate.
GoogLeNet=It has the inception module, very smart way of extracting local feature. You have a stack of convolutions that get stacked on top of each other. Even if it is deeper is very fast

63
Q

What is Transfer learning

A

Is a really common practice, where you do not train the entire moedl but you start from pretrained model and you train on top of them to increment the generalization

64
Q

What is dataaugmentation

A

Is a technique to increase the generalization of the model, where you make your prediction robust to certain type of problems (such as rotation for instance)

65
Q

Self supervised learning

A

You divide the image in spaces and you use this information to train on and to learn the representation

66
Q

How do we process videos?

A

We cannot simply use CNN but we have to use RNN, where the networ has an additonal parameter the state variable.

Examples are the:

  • LSTM
  • GRU
  • Vanilla RNN
67
Q

Predictive vision

A

Given a picure and an agent we would like to predict the action of the agent.

This ability is ruled by two factors:

  • Dynamics knoledge
  • Understand of the semantic of the scene

You need to use knowledge transfer to augment the data

68
Q

What is image captioning

A

Given an image you are able to generate a textual description of it.

We do it by training a RNN model on the output of a fully connected layer. This will beour initial state and it will have a certain probability of being a certain word and then you continue with the RNN