Lecture 7 and 8 Flashcards
Neural Networks, Word Vectors
Introduction to Neural Nets
In 2018, Google introduced new text processing techniques heavily dependent on the use of deep learning neural networks. To understand deep learning, it is valuable to first understand how “regular” artificial neural networks (ANNs) work in a simpler form.
Deep Learning
Representation learning for automatically learning good features or representations
Representational learning:
learning representations of the data that make it easier to extract useful information when building classifiers or other predictors
Overview of Neural Networks
- Weights: These are calculated during the training process
- Bias: Like an intercept value in a regression
- Inputs: Observed Variables
Overview of Neural Networks
The activation function can be virtually any formula that will produce an output from the summated input, but for learning to work properly, the function must generally be differentiable. Here’s the original perceptron activation function (not differentiable):
Overview of Neural Networks
Activation Function - f(x)
Output - y
Common Activation Functions
The activation function of a node defines the output of that node given an input or set of inputs. Programmers choose different activation functions based on the system performance exhibited for various applications.
Common Activation Functions
- Hyper Tangent Function
- ReLU Function
- Sigmoid Function
- Identity Function
ReLU Function
Perceptrons neuron model (left) and activation function (right).
Neural Network Models with Hidden Layers
A typical neural network consists of a few layers; an input layer, an optional hidden layer and an output layer. Using an identity activation function and no hidden layers, the analysis is equivalent to OLS regression
Deep Learning:
Deep learning is simply a more complex neural network. There are often many hidden layers – sometimes dozens – and multiple output nodes to estimate multidimensional output(s). It is also possible to use different activation functions on different nodes.
Training –Forward pass
The forward pass
Initially, filter value is randomly assigned -> performance is expected to be (very) bad
The loss function
E(total) = Σ½(target - output)²
Cost/Error function
(mean squared error)
The backward pass
One way of visualizing this idea of minimizing the loss is to consider a 3-D
graph where the weights of the neural net (there are obviously more than
2 weights, but let’s go for simplicity) are the independent variables and
the dependent variable is the loss. The task of minimizing the loss
involves trying to adjust the weights so that the loss decreases. In visual
terms, we want to get to the lowest point in our bowl shaped object. To
do this, we have to take a derivative of the loss (visual terms: calculate
the slope in every direction) with respect to the weights.
Learning Through Backpropagation
Backpropagation takes the difference between the predicted value and the actual value and uses that error term to adjust each node’s weights.
Learning Through Backpropagation
The process works backwards from the final layers to earlier layers, one layer at a time, and computes the contribution that each weight in the given layer had in the loss value.