ann Flashcards
latent
existing but not yet developed; hidden, concealed
simple perceptron
activation function = step function
1. perform (forward) inference
2. compute loss
3. update weights (remember learning rate)
multilayer perceptron
use backpropagation and gradient descent to update weights
output layer activation/loss
binary classification = sigmoid + binary cross-entropy
multiclass classification - non-mutually exclusive = sigmoid + binary cross-entropy on each output
multiclass classification - mutually exclusive = softmax (normalize output) + categorical cross-entropy
regression = no activation + MSE
hidden layer activation
sigmoid and tanh - small and large values of z cause gradient to be 0
ReLU = max (0, z) - gradient = 0 for z negative
LReLU - nonlinear but piecewise linear, gradient != 0
LReLU(x) = x, x > 0
= ax, x < 0, a = configurable slope
detecting over/underfitting
underfitting - training and validation set errors are both high => need higher ANN complexity
overfitting - training loss decreasing, validation loss increasing => IID, more data, decrease complexity, smaller magnitude weights (regularization)
Universal Approximation Theorem
A MLP with a linear output layer, at least one hidden layer with any squashing activation function (sigmoid/tanh) can approximate any function, provided the network is given enough hidden neurons
weight initialization
needs to be random to break the symmetry of the ANN
sample from normal distribution
learning rate
high learning rate - faster, but can miss minimum
small learning rate - slower, but guaranteed to reach minimum
mini batch gradient descent
combination between stochastic (fast, high variance) and batch (slow, low variance)
not one, not all training examples at a time, but some (4, 8, 16)
training procedure
- split into training, validation and test set
- split into mini-batches, update all weights
- after all mini-batches are done, one epoch is done, reshuffle, do 3-5 epochs
- set some checkpoints for every epoch, compute training vs validation loss to prevent over/underfitting
- tune hyperparameters on validation set
- report performance results on test set with k-fold cross validation
CNN uses
image classification, object detection, object segmentation
CNN technology
apply convolutional filters to reduce height/width of feature map and increase depth => 1-dimensional array => plug into ANN
encoder-decoder / autoencoder
CNN on levels
first layers - detect low lever features (vertical/horizontal lines)
deeper layers - detect higher level concepts (parts of face)
deepest layers - reconstruct entire image, highlight most important classification features (moustache)
RNN
word2vec