ann Flashcards

1
Q

latent

A

existing but not yet developed; hidden, concealed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

simple perceptron

A

activation function = step function
1. perform (forward) inference
2. compute loss
3. update weights (remember learning rate)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

multilayer perceptron

A

use backpropagation and gradient descent to update weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

output layer activation/loss

A

binary classification = sigmoid + binary cross-entropy
multiclass classification - non-mutually exclusive = sigmoid + binary cross-entropy on each output
multiclass classification - mutually exclusive = softmax (normalize output) + categorical cross-entropy
regression = no activation + MSE

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

hidden layer activation

A

sigmoid and tanh - small and large values of z cause gradient to be 0
ReLU = max (0, z) - gradient = 0 for z negative
LReLU - nonlinear but piecewise linear, gradient != 0
LReLU(x) = x, x > 0
= ax, x < 0, a = configurable slope

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

detecting over/underfitting

A

underfitting - training and validation set errors are both high => need higher ANN complexity
overfitting - training loss decreasing, validation loss increasing => IID, more data, decrease complexity, smaller magnitude weights (regularization)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Universal Approximation Theorem

A

A MLP with a linear output layer, at least one hidden layer with any squashing activation function (sigmoid/tanh) can approximate any function, provided the network is given enough hidden neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

weight initialization

A

needs to be random to break the symmetry of the ANN
sample from normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

learning rate

A

high learning rate - faster, but can miss minimum
small learning rate - slower, but guaranteed to reach minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

mini batch gradient descent

A

combination between stochastic (fast, high variance) and batch (slow, low variance)
not one, not all training examples at a time, but some (4, 8, 16)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

training procedure

A
  1. split into training, validation and test set
  2. split into mini-batches, update all weights
  3. after all mini-batches are done, one epoch is done, reshuffle, do 3-5 epochs
  4. set some checkpoints for every epoch, compute training vs validation loss to prevent over/underfitting
  5. tune hyperparameters on validation set
  6. report performance results on test set with k-fold cross validation
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

CNN uses

A

image classification, object detection, object segmentation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

CNN technology

A

apply convolutional filters to reduce height/width of feature map and increase depth => 1-dimensional array => plug into ANN
encoder-decoder / autoencoder

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

CNN on levels

A

first layers - detect low lever features (vertical/horizontal lines)
deeper layers - detect higher level concepts (parts of face)
deepest layers - reconstruct entire image, highlight most important classification features (moustache)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

RNN

A

word2vec

How well did you know this?
1
Not at all
2
3
4
5
Perfectly