6.19 - Pattern recognition and Categorisation in FFFNs Flashcards
Understand hierarchical processing in feed forward neural networks.
What is a perceptron and what does this network architecture look like?
The perceptron is an algorithm for supervised learning of binary classifiers in a feed-forward neural network. The perceptron is a linear classifier, it can decide whether an input vector belongs to a specific class by being trained.
What is the perceptron XOR problem?
The fact that a one-layer perceptron cannot learn an “exclusive or” (XOR) logical function.
A one layer-perceptron can only produce linear separation boundaries.
What is the role of convolutional layers?
Convolutional layers extract features.
These features can be used in hierarchies to recognize patterns.
The convolution is performed on the input data with the use of a filter or kernel (these terms are used interchangeably) to then produce a feature map. We execute a convolution by sliding the filter over the input. At every location, matrix multiplication is performed that sums the result onto the feature map.
This is a convolution operation. You can see the filter (the pink square) is sliding over our input (the blue square) and the sum of the convolution goes into the feature map (the red square). The area of the filter is also called the receptive field. The size of this filter is 3x3. The fact that one filter is used for the entire image makes convolutional neural networks very location-invariant and prevents them from overfitting. This means they are able to recognize features independent of their location in the input image.
Numerous convolutions are performed on our input, where each operation uses a different filter. This results in different feature maps. In the end, we take all of these feature maps and put them together as the final output of the convolution layer.
What is the role of Regularization?
Regularization simplifies network connections.
We want to have neural networks that are able to generalize well, i.e. having weights so that they perform well on multiple datasets (instead of one specific dataset). In supervised learning, we can measure the performance of a network by subtracting the predictions from the ground truth labels. The difference between the two is the loss or cost.
The loss is calculated by adding the regularization term to the error. By doing this, regularization discourages the complexity of the model. Reducing weights to a value close to zero will decrease the loss and simplify the model. This helps to prevent overfitting.
Explain the difference between L1 and L2 regularisation.
L1: a cost based on the norm of the weights in the network
L2: a cost based on the euclidean length of the weights of the network
What is the difference between regression and classification?
Classification: is about predicting a label (discrete class/category)
Regression: predicting a quantity (numerical value) given inputs.
What is overfitting?
Overfitting is a phenomenon that occurs when a machine learning or statistics model is tailored to a particular training dataset and is unable to generalize to unseen data. This is a problem in complex models, like deep neural networks.
In other words, an overfitted model performs well on training data but fails to generalize.
Usually, the more parameters the model has, the more functions it can represent, and the more likely it is to overfit.
How does the perceptron algorithm work?
(1) Initialize the w, b parameters at 0
(2) Keep cycling through the training data (x, y)
(3) If y(w*x+b)<=0 (a point is misclassified):
I. increase the value of w by y*x
w = w + y*x
II. increase the value of b by y
b = b+y
Describe the behavior and significance of the rectified linear unit (ReLU) function.
ReLU is a nonlinear activation function.
It returns an output of zero for negative input and returns as output the value x for any positive input x (see the figure).
Therefore, it allows for nonlinearity.
Any relationship or function can be roughly estimated by aggregating many ReLU functions together.
Why can we not approximate nonlinear relationships with linear activation functions?
A combination of linear layers is equal to a single linear layer because these are affine transformations (preserve lines and parallelism).
What is a moving average in a CNN?
A moving average is a sliding windowed average: for every t in a time series, one computes the average of the N points around it.
The local average is a form of convolution used to smooth out noise in data by replacing a data point with the average of neighboring values in a moving window.
How are convolutional neural networks shift-invariant?
Shifts in the input layer leave the output relatively unaffected.
What does it mean that images are discrete (digital)?
They sample the 2D space on a regular grid
In the context of image processing, what is filtering?
Forming a new image whose pixels are some function of the original pixel values.
When can we say that a convolutional filter is a linear system?
When summing two pictures and applying the filter leads to the same result as applying the filter to both pictures individually and summing the filtered pictures together.
F(Image1+Image2) = F(Image1)+F(Image2)
(the superposition principle)