ANNs and Backprop Flashcards

Question 1

Q

What are ANNs?

Answer

A

A new method of programming computers with automatic learning through training examples

Question 2

Q

What tasks are ANNs particularly good at?

Answer

A

Pattern recognition and other conventionally difficult to program tasks

Question 3

Q

What is the architecture of ANNs based on?

Answer

A

Loosely based on a biological brain

Question 4

Q

How do ANNs process information?

Answer

A

Using interconnected neurons

Question 5

Q

What type of reasoning do ANNs use?

Answer

A

Inductive reasoning (data to rules)

Question 6

Q

What is the memory type of ANNs?

Answer

A

Distributed and short-term

Question 7

Q

What is a key advantage of ANNs?

Answer

A

Fault tolerant due to redundancy

Question 8

Q

Name three applications of ANNs in classification.

Answer

A

Consumer behavior
Medical diagnosis
Fruit grading

Question 9

Q

What are two areas where ANNs are used for recognition/identification?

Answer

A

Speech
Vision

Question 10

Q

How are ANNs used in forecasting/prediction?

Answer

A

Weather
Stocks
Crop yield
Trends

Question 11

Q

What are the capabilities of ANNs?

Answer

A

Turing powerful, capable of approximating any function or mapping between vector spaces

Question 12

Q

What tasks do ANNs struggle with?

Answer

A

Symbolic manipulation and memory intensive tasks

Question 13

Q

Why are ANNs beneficial?

Answer

A

Avoids explicit system modelling by learning complex behaviors directly from data

Question 14

Q

How many neurons does a human brain have?

Answer

A

86 billion neurons

Question 15

Q

Fill in the blank: ANNs are best suited for _______.

Answer

A

classification and function approximation

Question 16

Q

True or False: ANNs can learn and adapt to changing conditions.

Question 17

Q

What are some applications of NLP?

Answer

A

Text categorization, part-of-speech tagging

NLP stands for Natural Language Processing.

Question 18

Q

What are examples of predictive analysis applications?

Answer

A

Stock market trends, weather prediction

Predictive analysis involves using data to forecast future outcomes.

Question 19

Q

What security applications are mentioned?

Answer

A

Motion detection, fingerprints

These applications enhance security systems.

Question 20

Q

In what business areas are predictive analytics widely used?

Answer

A

Data warehousing, uncovering patterns and trends

Major consulting firms utilize these techniques.

Question 21

Q

What is crucial for the success of Artificial Neural Networks (ANNs)?

Answer

A

Training data

The quality and quantity of training data directly affect ANN performance.

Question 22

Q

What is an artificial neuron?

Answer

A

A simplified model of a biological neuron

It serves as the foundational model for computational models in AI and neural networks.

Question 23

Q

What are the inputs of an artificial neuron denoted as?

Answer

A

I1, I2… In

These inputs are real numbers that the neuron processes.

Question 24

Q

What determines the significance of each input in an artificial neuron?

Answer

A

Weights

Each input has an associated weight that influences the neuron’s output.

Question 25

Q

What does the summation unit of an artificial neuron compute?

Answer

A

The weighted sum (logit) of the inputs

The formula is Σwi . Ii + b, where b is an optional bias term.

Question 26

Q

What is the role of the activation function in an artificial neuron?

Answer

A

Transforms the logit into the neuron’s output

The function f defines the behavior of the neuron.

Question 27

Q

What is a popular activation function mentioned?

Answer

A

Sigmoid

It is smooth and bounded between 0 and 1.

Question 28

Q

What are linear layers in deep learning?

Answer

A

Layers with no activation function

The output of each neuron in these layers is the logit.

Question 29

Q

Why must the activation function in neural networks be differentiable?

Answer

A

Required by algorithms that optimize the weights

Differentiability is necessary for gradient-based optimization methods.

Question 30

Q

What are two activation functions that are replacing sigmoid?

Answer

A

TanH, ReLu

These functions offer advantages in performance and convergence.

Question 31

Q

What is the linear step function also known as?

Answer

A

Heaviside function

It maps input values to 0 or 1 based on a threshold t.

Question 32

Q

Describe the typical structure of an ANN.

Answer

A

Input layer, 1+ hidden layers, output layer

Data flows from inputs X to outputs Y, with neurons connected by weights.

Question 33

Q

How is pattern recognition implemented in neural networks?

Answer

A

Using a feed forward neural network

The network associates target outputs with input patterns during training.

Question 34

Q

What must be available for effective pattern recognition in neural networks?

Answer

A

Good labelled training data

Quality training data is essential for accurate pattern association.

Question 35

Q

What are weights in the context of neural networks?

Answer

A

Model parameters or just parameters

They are adjusted during training to optimize performance.

Question 36

Q

What is a Perceptron?

Answer

A

A simple two-layer feed forward neural network with an input layer and an output layer.

Uses a linear step function with t=0.5.

Question 37

Q

What types of functions can a Perceptron compute?

Answer

A

Boolean AND and OR functions.

Requires finding a set of weights for binary input.

Question 38

Q

How do input neurons function in a Perceptron?

Answer

A

Input neurons act as identity functions with a weight of 1.

Always output the input value, whether 0 or 1.

Question 39

Q

What is a limitation of the Perceptron?

Answer

A

It can only compute functions that are geometrically linearly separable.

This means inputs can be separated by a straight line in input space.

Question 40

Q

Is XOR linearly separable?

Answer

A

No, XOR is not linearly separable.

True and false inputs cannot be separated by a straight line.

Question 41

Q

What is required to compute XOR?

Answer

A

A nonlinear activation function.

XOR cannot be computed by a standard Perceptron.

Question 42

Q

What did Minsky and Papert’s book highlight about the Perceptron?

Answer

A

It led researchers to focus on symbolic AI instead of neural networks.

Their analysis suggested limitations of single-layer perceptrons.

Question 43

Q

What is a Multi Layer Perceptron (MLP)?

Answer

A

A type of Perceptron with at least three layers: input, hidden, and output.

It is Turing powerful if it has a nonlinear activation function.

Question 44

Q

What is a key feature of MLPs according to Minsky and Papert?

Answer

A

An MLP can theoretically compute any computable function.

Requires at least one hidden layer and a nonlinear activation function.

Question 45

Q

What was one main issue in developing MLPs?

Answer

A

Finding a consistent set of weights for training examples.

Also involves determining the number of layers and neurons.

Question 46

Q

What impact did Minsky and Papert’s analysis have on neural networks?

Answer

A

It led to a perception that all neural network architectures were flawed.

Resulted in reduced funding and interest in neural networks.

Question 47

Q

What is the relationship between MLPs and ANNs?

Answer

A

MLP is one type of artificial neural network (ANN).

It is one of the simplest and most popular neural networks.

Question 48

Q

Classification vs Regression

Answer

A

These are the two main categories for supervised learning algorithms. The biggest difference is that while regression tries to predict a continuous quantity, classification predicts discrete class labels.

Question 49

Q

Ex of Regression

Answer

A

Predicting tomorrow’s price of a certain stock from historical data.

Question 50

Q

Ex. of Classification

Answer

A

is recognizing dog images from cat images.

Question 51

Q

MLP Training vs Testing

Answer

A

In supervised learning we need labelled datasets. Usually divided randomly into training and testing examples. Python ML packages fit() fits training data to the model. Testing data is classified by the trained model to determine the classification or prediction performance. Various metrics used to evaluate these algorithms.

Question 52

Q

Training a Multi-Layer Perceptron

Answer

A

Task for finding a set of weights that will allow the MLP to classify the training examples correctly. If we have 50 labelled images of cats and dogs, use feature selection to represent them as vectors. MLP is then trained to output 0,1 for cat and 1,0 for dog. Weights initially randomized between -1 and 1. It is infeasible to find weights by inspection.

Question 53

Q

The goal of any learning algorithm

Answer

A

is to fund a function that best maps inputs to their correct output. MLP training is an optimization task of finding the right weights to compute any arbitrary mapping of input to output. This can be done using the Error backprop algorithm.

Question 54

Q

MLP Topology

Answer

A

Number of layers and neurons in each layer
Connections between the neurons and their direction
Activation function used for the neurons

Question 55

Q

error surface

Answer

A

The error surface of a neural network is rarely very smooth and well-behaved. Error surfaces tend to be very convoluted with numerous local minima.

Question 56

Q

One hot encoding

Answer

A

If you want a NN to classify unseen items into 3 classes, you assign for each class ‘1’ to a specific output neuron with all other output neurons assigned a ‘0’. For n classes you need n output neurons. For example ‘100’ ‘010’ and ‘001’ can be assigned for the 3 classes. Used often to label the training examples. Most popular output representation (target labelling) sometimes called ‘distributed representation’.

Question 57

Q

Multi-Layer Perceptron - Pre-Training

Answer

A

Randomize all the training examples
Initialize all the weights with random values between -1 and 1.
Set the learning rate η hyperparameter (usually 0.2). Determines how much of the neuron error is used to modify the weights.
Set the Error Threshold μ hyperparameter. (usually 0.2). Determines how much of the neuron error ‘leeway’ is given to the output layer neurons.
Another hyperparameter is the maximum number of epochs.

Question 58

Q

Notes about MLP Training

Answer

A

Error Backprop Algorithm modifies the weights when training encounters a bad fact. The required number of epochs varies as sometimes the network is trained for a fixed number of epochs, and sometimes it trains until converges.
The network converges when it performs one epoch with only good facts. MEaning that for each training example, the error for each output neuron was less than the error threshold.
Backprop was not called to modify weights in the above case.
You don’t need to reshuffle the order of the training examples for each epoch.
When the network converges the weights must be stored. These weights correctly classify each training example.
Weights are used to classify unseen examples to determine if the network can generalize. If you do not save the weights you will have to retrain if you switch off the computer.
Common to store weights after each epoch (checkpointing).
To generalize in ML means learning from a fixed number of training examples in order to be able to classify correctly any unseen examples.
The set of weights obtained by convergence is not unique. If training is performed again, the weights will probably be different after convergence.
Weights are called parameters, other settings are hyperparameters (LR, Error Threshold, Activation function, epochs…)
Determine optimal hyperparameters through hyperparameter optimization.

Question 59

Q

ReLu vs Sigmoid

Answer

A

Sigmoid has some disadvantages; Computationally expensive, and input values below -4 or above 4 are mapped to 0 or 1 respectively, losing magnitude information ex. 5 and 500 are both mapped to 1.

ReLU and its derivative are much faster to compute
Addresses the vanishing gradient problem to some extent.
Networks like ReLU in practice tend to show better convergence performance than sigmoid
Tends to blow up activation since there is no mechanism to constrain the output of the neuron.
Dying ReLU problem - if too many inputs go below 0 then most of the units in the network will simply output zero, die and prohibit learning. This can be solved with leaky-relu.

Question 60

Q

Neuron Bias

Answer

A

Bias works like the intercept added in a linear equation. Additional parameter in an ANN which is used to adjust the output align with the weighted sum of the inputs to the neuron. Thus bias is a constant which helps the model to best fit for the given data. Bias allows you to shift (left or right) the activation function by adding a constant (bias) to the inpu

Question 61

Q

Output Formula

Answer

A

Output = f(sum(weight*input)+bias)

Question 62

Q

The bias term performs several important functions

Answer

A

Translation of Activation Function: The bias allows the activation function to be shifted to the left or right which helps the model make better approximations of the target function. Without a bias term, the neuron will be constrained to always pass through the origin, limiting its expressive capability.

Increased Flexibility: By adjusting bias and weights during the learning process, the model becomes more flexible. This added degree of freedom allows it to fit the training data better and generalize to new data.

Complexity and Non-Linearity: When used in conjunction with non linear activation functions, the bias term aids to introduce non-linearity to the model. This is important for tackling complex problems that cannot be solved adequately with linear methods.

Breaking symmetry: In the initialization phase, if neurons in the same layer have the same weights and biases, they’ll produce the same output, effectively making them identical. Biases help break this symmetry, allowing neurons to learn different features during training.

Question 63

Q

Dropout

Answer

A

Dropout is a regularization technique commonly used in MLPs and other NNs. The idea is to prevent overfitting by randomly setting a subset of neuron outputs to zero at each training stage. Overfitting is a critical issue in ML where a model performs exceedingly well on a dataset but poorly on unseen or validation data.