ANNs and Backprop Flashcards

1
Q

What are ANNs?

A

A new method of programming computers with automatic learning through training examples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What tasks are ANNs particularly good at?

A

Pattern recognition and other conventionally difficult to program tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the architecture of ANNs based on?

A

Loosely based on a biological brain

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do ANNs process information?

A

Using interconnected neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What type of reasoning do ANNs use?

A

Inductive reasoning (data to rules)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the memory type of ANNs?

A

Distributed and short-term

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a key advantage of ANNs?

A

Fault tolerant due to redundancy

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name three applications of ANNs in classification.

A
  • Consumer behavior
  • Medical diagnosis
  • Fruit grading
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are two areas where ANNs are used for recognition/identification?

A
  • Speech
  • Vision
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How are ANNs used in forecasting/prediction?

A
  • Weather
  • Stocks
  • Crop yield
  • Trends
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the capabilities of ANNs?

A

Turing powerful, capable of approximating any function or mapping between vector spaces

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What tasks do ANNs struggle with?

A

Symbolic manipulation and memory intensive tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Why are ANNs beneficial?

A

Avoids explicit system modelling by learning complex behaviors directly from data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How many neurons does a human brain have?

A

86 billion neurons

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Fill in the blank: ANNs are best suited for _______.

A

classification and function approximation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

True or False: ANNs can learn and adapt to changing conditions.

A

True

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What are some applications of NLP?

A

Text categorization, part-of-speech tagging

NLP stands for Natural Language Processing.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What are examples of predictive analysis applications?

A

Stock market trends, weather prediction

Predictive analysis involves using data to forecast future outcomes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What security applications are mentioned?

A

Motion detection, fingerprints

These applications enhance security systems.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

In what business areas are predictive analytics widely used?

A

Data warehousing, uncovering patterns and trends

Major consulting firms utilize these techniques.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

What is crucial for the success of Artificial Neural Networks (ANNs)?

A

Training data

The quality and quantity of training data directly affect ANN performance.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

What is an artificial neuron?

A

A simplified model of a biological neuron

It serves as the foundational model for computational models in AI and neural networks.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

What are the inputs of an artificial neuron denoted as?

A

I1, I2… In

These inputs are real numbers that the neuron processes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

What determines the significance of each input in an artificial neuron?

A

Weights

Each input has an associated weight that influences the neuron’s output.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

What does the summation unit of an artificial neuron compute?

A

The weighted sum (logit) of the inputs

The formula is Σwi . Ii + b, where b is an optional bias term.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

What is the role of the activation function in an artificial neuron?

A

Transforms the logit into the neuron’s output

The function f defines the behavior of the neuron.

27
Q

What is a popular activation function mentioned?

A

Sigmoid

It is smooth and bounded between 0 and 1.

28
Q

What are linear layers in deep learning?

A

Layers with no activation function

The output of each neuron in these layers is the logit.

29
Q

Why must the activation function in neural networks be differentiable?

A

Required by algorithms that optimize the weights

Differentiability is necessary for gradient-based optimization methods.

30
Q

What are two activation functions that are replacing sigmoid?

A

TanH, ReLu

These functions offer advantages in performance and convergence.

31
Q

What is the linear step function also known as?

A

Heaviside function

It maps input values to 0 or 1 based on a threshold t.

32
Q

Describe the typical structure of an ANN.

A

Input layer, 1+ hidden layers, output layer

Data flows from inputs X to outputs Y, with neurons connected by weights.

33
Q

How is pattern recognition implemented in neural networks?

A

Using a feed forward neural network

The network associates target outputs with input patterns during training.

34
Q

What must be available for effective pattern recognition in neural networks?

A

Good labelled training data

Quality training data is essential for accurate pattern association.

35
Q

What are weights in the context of neural networks?

A

Model parameters or just parameters

They are adjusted during training to optimize performance.

36
Q

What is a Perceptron?

A

A simple two-layer feed forward neural network with an input layer and an output layer.

Uses a linear step function with t=0.5.

37
Q

What types of functions can a Perceptron compute?

A

Boolean AND and OR functions.

Requires finding a set of weights for binary input.

38
Q

How do input neurons function in a Perceptron?

A

Input neurons act as identity functions with a weight of 1.

Always output the input value, whether 0 or 1.

39
Q

What is a limitation of the Perceptron?

A

It can only compute functions that are geometrically linearly separable.

This means inputs can be separated by a straight line in input space.

40
Q

Is XOR linearly separable?

A

No, XOR is not linearly separable.

True and false inputs cannot be separated by a straight line.

41
Q

What is required to compute XOR?

A

A nonlinear activation function.

XOR cannot be computed by a standard Perceptron.

42
Q

What did Minsky and Papert’s book highlight about the Perceptron?

A

It led researchers to focus on symbolic AI instead of neural networks.

Their analysis suggested limitations of single-layer perceptrons.

43
Q

What is a Multi Layer Perceptron (MLP)?

A

A type of Perceptron with at least three layers: input, hidden, and output.

It is Turing powerful if it has a nonlinear activation function.

44
Q

What is a key feature of MLPs according to Minsky and Papert?

A

An MLP can theoretically compute any computable function.

Requires at least one hidden layer and a nonlinear activation function.

45
Q

What was one main issue in developing MLPs?

A

Finding a consistent set of weights for training examples.

Also involves determining the number of layers and neurons.

46
Q

What impact did Minsky and Papert’s analysis have on neural networks?

A

It led to a perception that all neural network architectures were flawed.

Resulted in reduced funding and interest in neural networks.

47
Q

What is the relationship between MLPs and ANNs?

A

MLP is one type of artificial neural network (ANN).

It is one of the simplest and most popular neural networks.

48
Q

Classification vs Regression

A

These are the two main categories for supervised learning algorithms. The biggest difference is that while regression tries to predict a continuous quantity, classification predicts discrete class labels.

49
Q

Ex of Regression

A

Predicting tomorrow’s price of a certain stock from historical data.

50
Q

Ex. of Classification

A

is recognizing dog images from cat images.

51
Q

MLP Training vs Testing

A

In supervised learning we need labelled datasets. Usually divided randomly into training and testing examples. Python ML packages fit() fits training data to the model. Testing data is classified by the trained model to determine the classification or prediction performance. Various metrics used to evaluate these algorithms.

52
Q

Training a Multi-Layer Perceptron

A

Task for finding a set of weights that will allow the MLP to classify the training examples correctly. If we have 50 labelled images of cats and dogs, use feature selection to represent them as vectors. MLP is then trained to output 0,1 for cat and 1,0 for dog. Weights initially randomized between -1 and 1. It is infeasible to find weights by inspection.

53
Q

The goal of any learning algorithm

A

is to fund a function that best maps inputs to their correct output. MLP training is an optimization task of finding the right weights to compute any arbitrary mapping of input to output. This can be done using the Error backprop algorithm.

54
Q

MLP Topology

A

Number of layers and neurons in each layer
Connections between the neurons and their direction
Activation function used for the neurons

55
Q

error surface

A

The error surface of a neural network is rarely very smooth and well-behaved. Error surfaces tend to be very convoluted with numerous local minima.

56
Q

One hot encoding

A

If you want a NN to classify unseen items into 3 classes, you assign for each class ‘1’ to a specific output neuron with all other output neurons assigned a ‘0’. For n classes you need n output neurons. For example ‘100’ ‘010’ and ‘001’ can be assigned for the 3 classes. Used often to label the training examples. Most popular output representation (target labelling) sometimes called ‘distributed representation’.

57
Q

Multi-Layer Perceptron - Pre-Training

A

Randomize all the training examples
Initialize all the weights with random values between -1 and 1.
Set the learning rate η hyperparameter (usually 0.2). Determines how much of the neuron error is used to modify the weights.
Set the Error Threshold μ hyperparameter. (usually 0.2). Determines how much of the neuron error ‘leeway’ is given to the output layer neurons.
Another hyperparameter is the maximum number of epochs.

58
Q

Notes about MLP Training

A

Error Backprop Algorithm modifies the weights when training encounters a bad fact. The required number of epochs varies as sometimes the network is trained for a fixed number of epochs, and sometimes it trains until converges.
The network converges when it performs one epoch with only good facts. MEaning that for each training example, the error for each output neuron was less than the error threshold.
Backprop was not called to modify weights in the above case.
You don’t need to reshuffle the order of the training examples for each epoch.
When the network converges the weights must be stored. These weights correctly classify each training example.
Weights are used to classify unseen examples to determine if the network can generalize. If you do not save the weights you will have to retrain if you switch off the computer.
Common to store weights after each epoch (checkpointing).
To generalize in ML means learning from a fixed number of training examples in order to be able to classify correctly any unseen examples.
The set of weights obtained by convergence is not unique. If training is performed again, the weights will probably be different after convergence.
Weights are called parameters, other settings are hyperparameters (LR, Error Threshold, Activation function, epochs…)
Determine optimal hyperparameters through hyperparameter optimization.

59
Q

ReLu vs Sigmoid

A

Sigmoid has some disadvantages; Computationally expensive, and input values below -4 or above 4 are mapped to 0 or 1 respectively, losing magnitude information ex. 5 and 500 are both mapped to 1.

ReLU and its derivative are much faster to compute
Addresses the vanishing gradient problem to some extent.
Networks like ReLU in practice tend to show better convergence performance than sigmoid
Tends to blow up activation since there is no mechanism to constrain the output of the neuron.
Dying ReLU problem - if too many inputs go below 0 then most of the units in the network will simply output zero, die and prohibit learning. This can be solved with leaky-relu.

60
Q

Neuron Bias

A

Bias works like the intercept added in a linear equation. Additional parameter in an ANN which is used to adjust the output align with the weighted sum of the inputs to the neuron. Thus bias is a constant which helps the model to best fit for the given data. Bias allows you to shift (left or right) the activation function by adding a constant (bias) to the inpu

61
Q

Output Formula

A

Output = f(sum(weight*input)+bias)

62
Q

The bias term performs several important functions

A

Translation of Activation Function: The bias allows the activation function to be shifted to the left or right which helps the model make better approximations of the target function. Without a bias term, the neuron will be constrained to always pass through the origin, limiting its expressive capability.

Increased Flexibility: By adjusting bias and weights during the learning process, the model becomes more flexible. This added degree of freedom allows it to fit the training data better and generalize to new data.

Complexity and Non-Linearity: When used in conjunction with non linear activation functions, the bias term aids to introduce non-linearity to the model. This is important for tackling complex problems that cannot be solved adequately with linear methods.

Breaking symmetry: In the initialization phase, if neurons in the same layer have the same weights and biases, they’ll produce the same output, effectively making them identical. Biases help break this symmetry, allowing neurons to learn different features during training.

63
Q

Dropout

A

Dropout is a regularization technique commonly used in MLPs and other NNs. The idea is to prevent overfitting by randomly setting a subset of neuron outputs to zero at each training stage. Overfitting is a critical issue in ML where a model performs exceedingly well on a dataset but poorly on unseen or validation data.