Lec 5 | Neural Networks Flashcards
It is inspired by neuroscience
AI Neural Networks
- These are connected to and can receive and send electrical signals from other ____________ .
- They process input signals and can be activated which will send its electrical signal forward.
- A unit connected to other units
Neuron
- It is mathematical model for learning inspired by biological neural network
- It models mathematical functions that map inputs to outputs based on the structure and parameters of the network.
- Allows for learning the network’s parameters based on data
Artificial Neural Network
w₀ is a constant, also called [???] , modifying the value of the whole expression
Bias
Activation Functions
gives 0 before a certain threshold is reached and 1 after the threshold is reached.
Step Function
Activation Functions
Give the formula for the step function
g(x) = 1 if x ≥ 0, else 0
Activation Functions
gives as output any real number from 0 to 1, thus expressing graded confidence in its judgment
Logistic Function/Sigmoid
Activation Functions
What is the formula for the logistic function/sigmoid?
g(x) = e^x / ((e^x)+1)
Activation functions
Allows the output to be any positive value. If the value is negative, it sets it to 0.
Rectified Linear Unit (ReLU)
Activation Functions
Formula for ReLU?
g(x) = max(0, x)
It is an algorithm for minimizing loss when training neural networks
Gradient Descent
PSEUDOCODE
Gradient Descent
Start with a random choice of weights Repeat: Calculate the gradient based on all data points: that will lead to decreasing loss. Update weights according to the gradient.
What is a drawback or a problem of the gradient descent and how do you solve or minimize the problem?
- It requires to calculate the gradient based on all data points. It is computationally costly
- Use Stochastic Gradient Descent or Mini-Batch Gradient Descent to minimize the problem.
The gradient is calculated based on one point chosen at random.
Stochastic Gradient Descent
PSEUDOCODE
Stochastic Gradient Descent
Start with a random choice of weights Repeat: Calculate the gradient based on one data point: that will lead to decreasing loss. Update weights according to the gradient.
What is a drawback or problem with Stochastic Gradient Descent and how can it be solve?
It can be inacurrate. A way to solve this is by using Mini-Batch Gradient Descent.
- This computes the gradient based on on a few points selected at random
- Finds a compromise between computation cost and accuracy
Mini-batch Gradient Descent
Pseudocode
Mini-batch Gradient Descent
Start with a random choice of weights Repeat: Calculate the gradient based on one small batch: that will lead to decreasing loss. Update weights according to the gradient.
Main takeaway for Gradient Descents?
None of these solutions is perfect, and different solutions might be employed in different situations.
- Only capable of learning linearly separable decision boundary.
- Uses a straight line to separate data
- It could classify an input to be one type or another
Perceptron
Some data are not linearly separatable. What do we do use for data that are non-linearly separable?
multilayer neural networks
an artificial neural network with an input layer, an output layer, and at least one hidden layer.
Multilayer Neural Networks
It processes weighted inputs. It receives weights, performs an action on it and passes outputs to the next layer, until the output layer(final layer ?) is reached. It enables modeling of non-linear data.
Hidden Layer
It is the main algorithm used for training neural networks with hidden layers.
Backpropagation
How does backpropagation work?
It does so by starting with the errors in the output units, calculating the gradient descent for the weights of the previous layer, and repeating the process until the input layer is reached.
PSEUDOCODE
Backpropagation
* Calculate error for output layer * For each layer, starting with output layer and moving inwards towards earliest hidden layer: * Propagate error back one layer. In other words, the current layer that’s being considered sends the errors to the preceding layer. * Update weights.
Can you extend the backpropagation algorithm?
This can be extended to any number of hidden layers, creating deep neural networks, which are neural networks that have more than one hidden layer.
neural network with multiple hidden layers
Deep Neural Network
It is the danger of modeling the training data too closely, thus failing to generalize to new data.
Overfitting
How to combat overfitting?
Use Dropout
The temporary removing of units — selected at random — from a neural
network to prevent over-reliance on certain units. Throughout training, the neural network will assume different forms, each time dropping some other units and then using them again.
Dropout
A python library that has an an implementation for neural networks using the backpropagation algorithm
TensorFlow
It encompasses the different computational methods for analyzing and understanding digital images, and it is often achieved using neural networks.
Computer Vision
What do images consist of?
pixels with RGB (3 values from 0-255)
What are the drawbacks of using Computer Vision?
- The breaking down of the image into pixels and the values of their colors, we can’t use the structure of the image as an aid.
- The sheer number of inputs is very big, which means that we will have to calculate a lot of weights.
Applys a filter that adds each pixel value of an image to its neighbors, weighted according to a kernel matrix. Doing so alters the image and can help the neural network process it.
Image Convolution
A drawback of Image Convolution?
It is computationally expensive due to the
number of pixels that serve as input to the neural network..
It reduces the size of an input by sampling from regions in the input
Pooling
Pooling by choosing the maximum value in each region
Max-pooling
Neural networks that use convolution, usually for analyzing images
Convolutional Neural Network
Explain how Convolution Neural Networks work.
starts by applying filters that can help distill some features of the image using different kernels. These filters can be improved in the same way as other weights in the neural network, by adjusting their kernels based on the error of the output. Then, the resulting images are pooled, after which the pixels are fed to a traditional neural network as inputs
Give a benefit of Convolutional Networks
One of the benefits of these processes is that, by convoluting and pooling, the neural network becomes less sensitive to variation.
That is, if the same picture is taken from slightly different angles, the input for convolutional neural network will be similar, whereas, without convolution and pooling, the input from each image would be vastly different.
- Neural network that has connections only in one direction
- An input data is provided to the network, which eventually produces some output.
Feed-forward neural Network
What is a limitation of the Feed-forward Neural Network
Input needs to be in a fixed shape/fixed number of neurons and has a
fixed number of output
It consists of a non-linear structure, where the network uses its own output as input.
Recurrent Neural Network
Explain the difference between Recurrent Neural Networks and Feed-Forward Neural Networks
Feed-Forward Neural Network
* Uses input to get output
* Incapable of varying the number of outputs
Recurrent Neural Network
* Uses output as input
* Capable of varying the number of outputs
* Helpful in cases where the network deals with sequences and not a single individual object
CS50 QUIZ
Consider the below neural network, where we set:
- w0 = -5
- w1 = 2
- w2 = -1 and
- w3 = 3.
- x1, x2, and x3 represent input neurons, and y represents the output neuron.
What value will this network compute for y given inputs x1 = 3, x2 = 2, and x3 = 4 if we use a step activation function? What if we use a ReLU activation function?
- 0 for step activation function, 0 for ReLU activation function
- 0 for step activation function, 1 for ReLU activation function
- 1 for step activation function, 0 for ReLU activation function
- 1 for step activation function, 1 for ReLU activation function
- 1 for step activation function, 11 for ReLU activation function
- 1 for step activation function, 16 for ReLU activation function
- 11 for step activation function, 11 for ReLU activation function
- 16 for step activation function, 16 for ReLU activation function
1 for step activation function, 11 for ReLU activation function
CS50 QUIZ
How many total weights (including biases) will there be for a fully connected neural network with a single input layer with 3 units, a single hidden layer with 5 units, and a single output layer with 4 units?
44
CS50 QUIZ
Consider a recurrent neural network that listens to a audio speech sample, and classifies it according to whose voice it is. What network architecture is the best fit for this problem?
- One-to-one (single input, single output)
- Many-to-one (multiple inputs, single output)
- One-to-many (single input, multiple outputs)
- Many-to-many (multiple inputs, multiple outputs)
Many-to-one (multiple inputs, single output)
CS50 QUIZ
Consider a 4x4 grayscale image with the following pixel values.
2 4 6 8 16 14 12 10 18 20 22 24 32 30 28 26
What would be the result of applying a 2x2 max-pool to the original image?
- [[16, 12], [32, 28]]
- [[16, 14], [32, 30]]
- [[22, 24], [32, 30]]
- [[14, 12], [30, 28]]
- [[16, 14], [22, 24]]
- [[16, 12], [32, 30]]
Answers are formatted as a matrix [[a, b], [c, d]] where [a, b] is the first row and [c, d] is the second row.)
[[16, 12], [32, 28]]