Neural Networks Flashcards
sequential environment
agent’s current action affects next future actions
episodic environment
agent’s current action does not affect future actions
output due to percept at time step n is independent of percept and output at time step n+1
feedforward neural network
FNN is a directed acyclic graph with internal state of network only depending on its current input (model process each input independently)
information moves in one direction and there is no cycles or loops
FNN example
MLP, fully connected network
RNN
type of neural network specifically designed for processing sequential dat
directed graph with cycles
internal state of network depends on its previous state
model exhibits dynamic temporal behaviour
uses output of hidden layer from previous tilmestep as additional input
RNN example
single hidden layer network with weights connecting output of hidden layer to itself with a time unit delay
temporal patterns
data that exhibit sequences/patterns over time
for sequential data problems temporal patterns are critical to learning dependencies between data points across time
RNN and temporal dependencies
RNN captures temporal dependencies by maintaining a hidden state allowing model to store important past info
RNN architecture and how its trained
processes sequential data by maintaining information about previous inputs through a recurrent hidden state. This gives RNNs a “memory” that captures dependencies over time
input layer: Each element in the sequence is fed to the network one timestep at a time
hidden layer: Has recurrent connections that allow it to use the previous hidden state along with the current input to capture dependencies over time
output layer: ouput depending on task
calculates the difference between the predicted output and the actual output and backpropogates that using BPTT
BPTT and how it works
same as BP but RNN is “unrolled” across all timesteps in the sequence
back propagation through time
average error is propagated back through all of the time steps through the network
the predicted output for the next word is compared to the actual next word using a loss function
gradients of the loss function with respect to the weights are computed for each time step.
Using the accumulated gradients from BPTT, the optimizer updates the weights
long distance dependencies and BPTT
BPTT can face challenges of vanishing gradient (gradient becomes very small and learning stalls) and exploding gradient (gradient excessively large and destabilises training ) if there are lot of words
hard to train earlier later with error from later layers
for example: Mary grew up in China………… She speaks Mandarin
eigen values and BPTT
eigenvalues of the weight matrix indicate whether the network’s output will explode (grow very large) or vanish (shrink towards zero) over time
e>1 grad wil explode (fixed by clipping weights)
e = 1 gradient will propagate nicely
e< 1 vanishing gradient
RNN and sequential environments
RNN is ideal for tasks where order of data is important
allows network o consider context and dependencies between inputs that occur over time
RNN enable learning in evirnment with temporal sequences by capturing relationships between inputs over time
allowing model to adapt its predictions based on past experiences
perplexity
evaluation for a language model
what is the chance of you picking selection of good words?
want models that computer high probability for corpus (The entire set of language data to be analyzed) and low perplexity (as they have an inverse relationship)
RNN Usage
1:1 image classficiation
1:m text generation
m:1 emotion classification
m:m translation