NN FINAL Flashcards by Ivan Tejeda

Hidden layers are needed if

the data must be separated using a non-linear boundary

How well did you know this?

Not at all

Perfectly

Major difference between ANN and Perceptron

the inclusion of hidden layers

How well did you know this?

Not at all

Perfectly

Universal Approximation Theorem for Neural Networks

A FFNN with a single hidden layer containing an arbitrary number of neurons can approximate any continuous function

How well did you know this?

Not at all

Perfectly

Universal Approximation theorem also proved for

arbitrary number of hidden layers, each containing a limited number of neurons

How well did you know this?

Not at all

Perfectly

Hidden layers can represent

arbitrary complex decision boundaries

How well did you know this?

Not at all

Perfectly

Deep neural network meaning

referring to the depth of the hidden layers. Typically more than 3-4 hidden layers

How well did you know this?

Not at all

Perfectly

FFNN

feed information forwards

How well did you know this?

Not at all

Perfectly

Back prop

errors are propagated backwards to correct the weights

How well did you know this?

Not at all

Perfectly

Downstream

towards the right

How well did you know this?

Not at all

Perfectly

upstream

towards the left

How well did you know this?

Not at all

Perfectly

FFNN used for

General neural networks, classification, regression

How well did you know this?

Not at all

Perfectly

Convolutional Neural Networks

Excel at image recognition

How well did you know this?

Not at all

Perfectly

Recurrent Neural Networks

Excel at language tasks, and predicting next word

How well did you know this?

Not at all

Perfectly

Long short-term memory networks

Like RNN, but for tasks that require longer context

How well did you know this?

Not at all

Perfectly

Generative-Adverserial networks

Generative neural network is trained on generating something
Adverserial network is then trained on classifying what was generated as whether it was accurate or not

How well did you know this?

Not at all

Perfectly

Hidden nodes learn

latent representation (features useful for class boundaries)

How well did you know this?

Not at all

Perfectly

First hidden layer captures

Study These Flashcards

simpler features (since it receives the predictors as input)

Subsequent hidden layers hone into

Study These Flashcards

specific patterns of the data to extract features

What does a neuron do?

Study These Flashcards

Exactly same thing we saw perceptron doing input where w transpose x happens, output part is where activation function is run

Activation function is important, as it provides

Study These Flashcards

non-linearity to an ANN and allows it to create non-linear class boundaries

How to choose an activation function at output layer?

Study These Flashcards

Match the activation function at the output layer based on the type of prediction problem

output Activation function for regression

Study These Flashcards

Linear activation function

output Activation function for Binary classification

Study These Flashcards

sigmoid/Logistic activation function

output Activation function for Multiclassification

Study These Flashcards

Softmax activation function

How to choose an activation function at hidden layers

Start with relu activation function and move to others if results are sub-optimal

What does it mean that NN is learning

Updating its weights

What should we initialize the weight vector with?

Random initialization w = N(0, o^2), normal distribution with mean of 0 and standard deviation of sigma squared

How to choose sigma squared?

Xavier initialization He initialization

Xavier initialization

2/((# of neurons in prev layer) + (number of neurons in next layer))

He initialization

2/(# of neurons in previous layer)

Result of backpropogation

A gradient vector of weights used for updating the weights to get to the minimum gradient

A gradient descent algorithm does

iteratively goes through the training dataset, modifying weights during each pass (epoch) to minimize the cost

Batch gradient descent

Calculate error for each observation and at the end of the training data calculate average error and update w

Stochastic gradient descent

Calculate error after each observation and update w

Mini-batch gradient descent

Split data into small batches, calculate error for each observation in a batch and at the end of the batch calculate average error and update w Preferred method

Neural networks are almost always

over-parametrized, yet they perform well

over-parameterization leads to

better learning

How much training data do i need for my neural network

no good answer; 10x more observations for training than there are parameters in your neural network

Gradient Descent Algorithm definition

The use of gradients to explore the minima of the error function

Gradient descent relates

Error function minimization Minimizes the error function with respect to its weights

Weight update formula

New weight = old weight - minus the learning rate multiplied by the gradient of the loss function with respect to the weight

NN FINAL Flashcards

(41 cards)