Neural Networks Flashcards
An AI Neural Network Is an ___ paradigm
It is inspired by the way biological nervous systems, such has the brain, ___ information
information processing
process
AI NN learn by ___, like people
Learning in biological systems involves adjustments to the synaptic ___ that exist between the ___
example
connections
neurones
NN derive ___ from ___ and ___ data
meaning
complicated
imprecise
NN characteristics 1- \_\_\_ 2- \_\_\_ 3- \_\_\_ 4- \_\_\_
A. Adaptive Learning
S. Self-Organization
R. Real Time Operation
F. Fault Tolerance
A Sopa RreFeceu
A Perceptron is a simple model that consists of a single trainable ___
It receives several ___ and its ___ and has a ___ T (real value)
neuron
inputs
weights
Threshold
To train a Perceptron we give ___ and the ___, then we give him ___ and tell him if he got it right or wrong
inputs
desired outputs
examples
What if the Perceptron has the wrong output?
If the Desired Output is 0, we showld decrease the weights
If the Desired Output is 1, we showld increase the weights
The decrease in weights of an edge should be ___ to the input through that edge
Meaning that if an input is really high then it should be accountable for ___ of the error of the output
directly proportional
most
Can a Perceptron solve problems that are not linearly separable?
No
What algorithm do we use to train a MLP?
Backpropagation Algorithm
In Backpropagation Algorithm we need to follow the following steps:
1- ___
2- For each training example do a ___
3- Obtain ___ by comparing the result with the ___
4- Do a ___
5- if loss ___ ℇ, or if loss is still ___ at a reasonable rate, go to 2
1- Initialization 2- forward propagation 3- loss (error) / desired output 4- backward propagation 5- >= /decreasing
True or false
The gradient descent method involves calculating the derivative of the loss error function with respect to the weights of the network
True
Can we solve any problem with a single hidden layer?
Yes
In Hidden Layers:
1- Too few neurons can lead to ___ as there are not enough to capture the problem ___
2- Too much neurons can lead to ___ as the information in the training set is not enough to ___ all neurons in the hiiden layers
Also there is an ___ increase in training time
1- Underfitting / intricate
2- Overfittin /train / Exponential
The porpose of the Activation Function is to introduce ___ to artificial NN
nonlinear real-world properties
What are the most common Activation Functions on MLPs?
S. Sigmoid
T. Tanh
R. Relu
SToR
1- Large learning rates result in ___ training and ___ results
2- Tiny learning rates ___ the training process and might result in a ___ to train
1- unstable / non-optimal
2- lengthen / failure
What Hyperparameters exist on MLPs?
Inputs Outputs Hidden Layers Activation Function Learning Rate
What are the typical values for the Learning Rate?
0.01 to 0.1
What are Epocs?
One Epoch is one run over the whole training set
NN extract ___ and detect ___ that are too ___ for humans or other techs
patterns
trends
complex
NN can perform tasks that are ___ to humans but ___ for other techs (like )
trivial
difficult
handwriting recognition
In a Perceptron If the ___ of the ___ multiplied by the respective ___ is greater than ___, then the Output is ___, and ___ otherwise
sum inputs weight T 1 0
The Backpropagation algorithm works by doing the following:
Given a set of input/output training data, find a set of weights to ___ using the ___ method
minimize
gradient descent
We can solve any problem with a single hidden layer, but for ___ problems it might be tricky and highly dependent on the quality of the ___. But if we have too many ___ we may need better ___ and the processing time is much ___
complex training set layers learning algorithms slower
The gradient descent method involves calculating the ___ with respect to the ___ of the network
derivative of the loss error function
weights
Some rules of thumb are of hidden lyers are:
1- Size of Input layer ___ Size of Hidden layer ___ size of output layer
2- Size of Hidden layer = ___ Size of Input layer + Size of Output layer
3- Size of Hidden layer < ___ Size of Input layer
1- > / >
2- 2/3
3- 2x
Hopfield Nets is a ___ NN where neurons are ___ units
Recurrent
binary threshold
Elman is a ___ NN with ___ inputs where the output from the previous step is ___ as the input to the current step
Recurrent
non-stationary
fed
Some Elman problems are:
1- Training a RNN is a very ___ task
2- It has ___ and ___ problems
1- difficult
2- vanishing and exploding
Hopefield Nets aplications are:
1- recalling or recontructing ___ patterns
2- Image ___
1- corrupted
2- detection
Elman aplications are: 1- \_\_\_ the next word in a sentence 2- \_\_\_ in time-series 3- \_\_\_ in computer networks 4- Human Action \_\_\_
1- Predicting
2- Anomaly detection
3- Intrusion Detection
4- recognition
LSTM were developed to deal with RNN ___ problem, allowing them to learn ___ dependencies by ___ longer sequences
vanishing gradient
long-term
remembering
The Hopfield Network has a capacity of ___ patterns for every ___ nodes
138
1000
LSTM Cell state allows easy flow of ___ through the subsequent ___ thereby helping preserve ___
unchanged information
LSTM cell
context
LSTM Forget Gate tells us what information can ___
be thrown away
LSTM Input Gate tells us what new information ___
should be stored
LSTM Update Current State ___ the things decided to forget earlier and add the ___ values
forgets
new candidate
LSTM Output Gate ___ and ____ values to be updated in the ___
decides and computes
hidden state
LSTM Regularization represents the techniques that ___ the learning algorithm to ___ better
modifie
generalize
Some regularization techniques are:
1- ___
2- ___
3- ___
1- Weight Regularization
2- Dropout
3- Early Stopping
Weight Regularization technique:
Slows down ___ and ___ of the network by ___ the weights of the nodes
learning and overfitting
penalizing
Dropout regularization technique:
___ of some input and recurrent connections
in order to prevent some ___ and ___ during the
learning process (slowing the learning process)
Probabilistic removal
activations and weight updates
Early Stopping regularization technique:
___ strategy which stops training
when performance on the validation set is not ___ after a certain iteration
Cross-validation
improving
LSTM is mostly used on ____ recognition and ___ recognition
handwriting and speech
GRU is a simplified ___ with fewer ___ and no ___
LSTM
parameters
output gate
CNN have become the standard for computer ___ tasks
vision
NN are ___ as it is impossible to understand why they produce their results
black boxes
NN should be used :
1- When a large amount of ___ is available and one cannot formulate an ___ solution
2- When continuous learning from ___ is important
3- When there is no need to extract ___ from the results
1- examples / algorithmic
2- previous results
3- knowledge
CNN its a way of incorporating invariance arising from ___, ___ and ___, into the NN model
scaling
translation
rotation
CNN exploits the strong correlarion between ___ pixels
neighboring
CNN has local ___ fields, ___ sharing and ___
receptive
weight
subsampling
CNN is composed by ___ and ___ layers, followed by an ___ layer that is task ___
convolutional and subsampling
output
dependent
The CNN is organized into planes, each known as ___
feature map
Each feature map in CNN is composed of ___
units
Units in CNN receive inputs from a small subrigion of the input image, know as its ___
receptive fiedl
In a CNN all units of a feature map share the same weight matrix (___)
kernel
Convolution is the operation where the ___ slides along the input image (stride), computing the value of a ___ in the feature map for each ___
kernel
unit
movement
In CNN, subsampling is the process of reducing a LxL patch of the feature map into a ___
single number