NN FINAL Flashcards

1
Q

Hidden layers are needed if

A

the data must be separated using a non-linear boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Major difference between ANN and Perceptron

A

the inclusion of hidden layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Universal Approximation Theorem for Neural Networks

A

A FFNN with a single hidden layer containing an arbitrary number of neurons can approximate any continuous function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Universal Approximation theorem also proved for

A

arbitrary number of hidden layers, each containing a limited number of neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hidden layers can represent

A

arbitrary complex decision boundaries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Deep neural network meaning

A

referring to the depth of the hidden layers. Typically more than 3-4 hidden layers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

FFNN

A

feed information forwards

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Back prop

A

errors are propagated backwards to correct the weights

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Downstream

A

towards the right

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

upstream

A

towards the left

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

FFNN used for

A

General neural networks, classification, regression

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Convolutional Neural Networks

A

Excel at image recognition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Recurrent Neural Networks

A

Excel at language tasks, and predicting next word

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Long short-term memory networks

A

Like RNN, but for tasks that require longer context

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Generative-Adverserial networks

A

Generative neural network is trained on generating something
Adverserial network is then trained on classifying what was generated as whether it was accurate or not

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Hidden nodes learn

A

latent representation (features useful for class boundaries)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

First hidden layer captures

A

simpler features (since it receives the predictors as input)

18
Q

Subsequent hidden layers hone into

A

specific patterns of the data to extract features

19
Q

What does a neuron do?

A

Exactly same thing we saw perceptron doing input where w transpose x happens, output part is where activation function is run

20
Q

Activation function is important, as it provides

A

non-linearity to an ANN and allows it to create non-linear class boundaries

21
Q

How to choose an activation function at output layer?

A

Match the activation function at the output layer based on the type of prediction problem

22
Q

output Activation function for regression

A

Linear activation function

23
Q

output Activation function for Binary classification

A

sigmoid/Logistic activation function

24
Q

output Activation function for Multiclassification

A

Softmax activation function

25
Q

How to choose an activation function at hidden layers

A

Start with relu activation function and move to others if results are sub-optimal

26
Q

What does it mean that NN is learning

A

Updating its weights

27
Q

What should we initialize the weight vector with?

A

Random initialization w = N(0, o^2), normal distribution with mean of 0 and standard deviation of sigma squared

28
Q

How to choose sigma squared?

A

Xavier initialization
He initialization

29
Q

Xavier initialization

A

2/((# of neurons in prev layer) + (number of neurons in next layer))

30
Q

He initialization

A

2/(# of neurons in previous layer)

31
Q

Result of backpropogation

A

A gradient vector of weights used for updating the weights to get to the minimum gradient

32
Q

A gradient descent algorithm does

A

iteratively goes through the training dataset, modifying weights during each pass (epoch) to minimize the cost

33
Q

Batch gradient descent

A

Calculate error for each observation and at the end of the training data calculate average error and update w

34
Q

Stochastic gradient descent

A

Calculate error after each observation and update w

35
Q

Mini-batch gradient descent

A

Split data into small batches, calculate error for each observation in a batch and at the end of the batch calculate average error and update w
Preferred method

36
Q

Neural networks are almost always

A

over-parametrized, yet they perform well

37
Q

over-parameterization leads to

A

better learning

38
Q

How much training data do i need for my neural network

A

no good answer; 10x more observations for training than there are parameters in your neural network

39
Q

Gradient Descent Algorithm definition

A

The use of gradients to explore the minima of the error function

40
Q

Gradient descent relates

A

Error function minimization
Minimizes the error function with respect to its weights

41
Q

Weight update formula

A

New weight = old weight - minus the learning rate multiplied by the gradient of the loss function with respect to the weight