Week 2 Flashcards by tyrion lannister

Defining characteristic of DNN

More than 1 hidden layer

How well did you know this?

Not at all

Perfectly

Advantages of SGD

Efficient for large sample
Implementable numerically
Can be ‘controlled’

How well did you know this?

Not at all

Perfectly

RNN

Recurrent Neural networks sequentially feed output back into network

How well did you know this?

Not at all

Perfectly

Connection from FNN to RNN

RNNs can be reduced to FNNs by UNFOLDING them

How well did you know this?

Not at all

Perfectly

First NN

McCulloch and Pitts

How well did you know this?

Not at all

Perfectly

Perceptron & problems of

Rosenblatt ‘58

How well did you know this?

Not at all

Perfectly

What started AI winter

‘69 Minsky showed XOR couldn’t be replicated by perceptron

How well did you know this?

Not at all

Perfectly

FULLY CONNECTED

If all entries of each L_i in NN are non zero

How well did you know this?

Not at all

Perfectly

Universal approximation property

Let g:R -> R be a measurable function such that:
a) g is not a polynomial function

How well did you know this?

Not at all

Perfectly

Define FNN

How well did you know this?

Not at all

Perfectly

Differences between (hyper)params

Hyper:
Set by hand
Features

Params:
Chosen by machine (weights and biases)
Optimised by SGD

How well did you know this?

Not at all

Perfectly

Architecture of network

Hyperparameters and Activatuon Functions (things chosen by you)

How well did you know this?

Not at all

Perfectly

Dense layer

Entire layer is connected (non zero)

How well did you know this?

Not at all

Perfectly

Number of parameters that characterise N

How well did you know this?

Not at all

Perfectly

Adding units

How well did you know this?

Not at all

Perfectly

Continuity and differentiability of NN

Study These Flashcards

Bottom line:
If every activation function is continuous, then so to is the NN

The NN is overall as differentiably continuous as the LEAST differentiable activation function

One dimensional activation functions (entire table)

Study These Flashcards

Dead ReLU problem & solution

Study These Flashcards

A layer of ReLU activations receives only negative values -> producing constant output

This can freeze gradient based algorithms

Therefore use leaky ReLU or Parametric ReLU (or ELU)

Usually 0 < α < 1

Multi dimensional activation

Study These Flashcards

When is identity function useful

Study These Flashcards

Output layer

Limitation of Heaviside

Study These Flashcards

As is non continuous, can’t be used in gradient based algorithms

Saturating activation functions

Study These Flashcards

Output is bounded

Sigmoid, tanh

Continually differentiable counterpart of ReLU

Study These Flashcards

Soft plus

Boltzmann dist

Study These Flashcards

From stat physics

Analogous to Multinomial logistic regression
Which is Standard activation function in image recognition

Motivation for max out

Several, simple, convex non linear functions can be expressed as maxima of affine functions eg ReLU(x) = max{0, x} |x| = max{-x, x}

Sup Norm

L^p norm

Def universal approximation property

Limitation of UAP

Result is non constructive: It does not tell what the approximating NNs f and h look like, just that they exist Also it is non quantitative: It doesn’t tell how many hidden units are required to create these networks

Week 2 Flashcards

(29 cards)