6.19 - Pattern recognition and Categorisation in FFNs Flashcards
What is a perceptron and what does this network architecture look like?
The perceptron is an algorithm for supervised learning of binary classifiers in a feed forward neural network. The perceptron is a linear classifier, it can decide whether an input vector belongs to a specific class.
The perceptron algorithm works in the following way:
- Multiply all inputs (x) with their respective weights (w)
- Add all of the results (=weighted sum)
(maybe add a bias)
- Put this in a transfer function (e.g. Heaviside).
Note: Step 1 is equivalent to taking the dot product between the inputs and the weights.
What is the perceptron XOR problem?
The fact that a one layer perceptron cannot learn an exclusive or function.
What is the role of convolutional layers?
Convolutional layers extract features.
The convolution is performed on the input data with the use of a filter or kernel (these terms are used interchangeably) to then produce a feature map. We execute a convolution by sliding the filter over the input. At every location, a matrix multiplication is performed and sums the result onto the feature map.
This is a convolution operation. You can see the filter (the green square) is sliding over our input (the blue square) and the sum of the convolution goes into the feature map (the red square). The area of the filter is also called the receptive field. The size of this filter is 3x3. The fact that one filter is used for the entire image makes convolutional neural networks very location-invariant and prevents them from overfitting. This means they are able to recognise features independent of their location in the input image.
Numerous convolutions are performed on our input, where each operation uses a different filter. This results in different feature maps. In the end, we take all of these feature maps and put them together as the final output of the convolution layer.
What is the role of Regularisation?
Regularization simplifies network connections.
We want to have neural networks that they are able to generalise well, i.e. having weights so that they perform well on multiple datasets (instead of one specific dataset). In supervised learning we can measure the performance of a network by subtracting the predictions from the ground truth labels. The difference between the two is the loss or cost.
The loss is calculated by adding the regularisation term to the error. By doing this, regularisation discourages the complexity of the model. Reducing a weight to a value close to zero will decrease the loss and simplify the mode. This helps to prevent overfitting.
Explain the difference between L1 and L2 regularisation.
L1: a cost based on the norm of the weights in the network
L2: a cost based on the euclidean length of the weights of the network
What is the difference between regression and classification?
Classification: is about predicting a label (discrete class / category)
Regression: about predicting a quantity (numerical value) given inputs.
What is overfitting?
Overfitting is a phenomenon that occurs when a machine learning or statistics model is tailored to a particular training dataset and is unable to generalise to unseen data. This is a problem in complex models, like deep neural networks. In other words, an overfitted model performs well on training data but fails to generalize.
Usually, the more parameters the model has, the more functions it can represent and the more likely it is to overfit.