Week 2 Flashcards

1
Q

Defining characteristic of DNN

A

More than 1 hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Advantages of SGD

A

Efficient for large sample
Implementable numerically
Can be ‘controlled’

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

RNN

A

Recurrent Neural networks sequentially feed output back into network

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Connection from FNN to RNN

A

RNNs can be reduced to FNNs by UNFOLDING them

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

First NN

A

McCulloch and Pitts

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Perceptron & problems of

A

Rosenblatt ‘58

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What started AI winter

A

‘69 Minsky showed XOR couldn’t be replicated by perceptron

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

FULLY CONNECTED

A

If all entries of each L_i in NN are non zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Universal approximation property

A

Let g:R -> R be a measurable function such that:
a) g is not a polynomial function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Define FNN

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Differences between (hyper)params

A

Hyper:
Set by hand
Features

Params:
Chosen by machine (weights and biases)
Optimised by SGD

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Architecture of network

A

Hyperparameters and Activatuon Functions (things chosen by you)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Dense layer

A

Entire layer is connected (non zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Number of parameters that characterise N

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Adding units

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Continuity and differentiability of NN

A

Bottom line:
If every activation function is continuous, then so to is the NN

The NN is overall as differentiably continuous as the LEAST differentiable activation function

17
Q

One dimensional activation functions (entire table)

A
18
Q

Dead ReLU problem & solution

A

A layer of ReLU activations receives only negative values -> producing constant output

This can freeze gradient based algorithms

Therefore use leaky ReLU or Parametric ReLU (or ELU)

Usually 0 < α < 1

19
Q

Multi dimensional activation

A
20
Q

When is identity function useful

A

Output layer

21
Q

Limitation of Heaviside

A

As is non continuous, can’t be used in gradient based algorithms

22
Q

Saturating activation functions

A

Output is bounded

Sigmoid, tanh

23
Q

Continually differentiable counterpart of ReLU

A

Soft plus

24
Q

Boltzmann dist

A

From stat physics

Analogous to Multinomial logistic regression
Which is Standard activation function in image recognition

25
Q

Motivation for max out

A

Several, simple, convex non linear functions can be expressed as maxima of affine functions eg

ReLU(x) = max{0, x}
|x| = max{-x, x}

26
Q

Sup Norm

A
27
Q

Lp norm

A
28
Q

Def universal approximation property

A
29
Q

Limitation of UAP

A

Result is non constructive:
It does not tell what the approximating NNs f and h look like, just that they exist

Also it is non quantitative:
It doesn’t tell how many hidden units are required to create these networks