CHAP 9 : Artificial Neural Networks Flashcards

1
Q

What are the parts associated with a neuron in an Artificial Neural Network (ANN)?

A
  1. SET of input values
  2. Weights
  3. Bias, X0 = 1
  4. Activation function
  5. One output
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is an activation function?

A

It is a threshold function that maps output to class 0 if it is under a certain threshold; and 1 if it is over or equal to the threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the purpose of activation functions?

A

Activation function helps to solve complex non-linear models

  • without activation function, neuron can only learn linear models –> summation of WiXi (weight * input features)
  • however, in reality, alot of data is complex and non-linear

[An activation function allows the neural network to model complex non-linear relationships between the input and output variables. It takes the weighted sum of the inputs and biases and applies a non-linear function to the result, producing the output of the neuron. The output of the activation function becomes the input to the next layer of neurons in the neural network.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the 4 commonly used activation functions?
write out the equations (refer to notes)

A
  1. Step function (binary step function)
  2. Sigmoid function
  3. ReLu – rectified linear units
  4. Leaky ReLu
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a perceptron?

A

A perceptron is a type of artificial neural network that is used for binary classification tasks. It consists of a single layer of artificial neurons. (one neuron)

  • NOTE : In lecture notes, perceptron model given is just a single neuron which gives a single output but there is multiple neuron perceptron (aka NN)

[In the case of a multi-neuron perceptron, the output of each neuron is still determined by a weighted sum of the input features, followed by an activation function. However, instead of having a single output value, the multi-neuron perceptron has multiple output values, each corresponding to a different class or category.]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does a perceptron do?

A

It takes a vector of real-valued inputs,
calculates a linear combination of these inputs,
and outputs a 1 if result is greater than a threshold and 0 otherwise

** Can only model linear data (linear decision boundary, OR or AND logic gate but not XOR) –> thus for non-linear data, NN is needed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What does an output 0 and input 1 mean for a neuron?

A

Output 0 : negative ;; output 1 : positive

Output 0 means a neuron will not get activated (or fired), and output 1 means that a neuron will be activated.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the purpose of bias in neural network?

A
  • The purpose of a bias term is to allow the activation function to be shifted to the left and right along the x-axis/ up or down along the y-axis, which can be useful for improving the performance of the neural network.

Without a bias term, the output of the activation function would always be centered around zero, which could limit the representational power of the neural network.

** See notes on illustration of diagram

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the purpose of a decision boundary?

A

to separate the data into different classes or categories.

  • A decision boundary is a boundary or surface in the input feature space that separates the different classes of data points.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

During training, the algorithm adjusts the weights and biases of the model to optimize the decision boundary so that it can accurately classify the input data. True or False?

A

True.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What happens to the decision boundary when there is overfitting?
How does it affect the performance of the model in classifying test data?

A
  • can lead to a decision boundary that is too complex and wiggly, which fits the training data very well but does not generalize well to new data.
  • the decision boundary might not be able to separate the classes of new data points correctly, leading to poor performance on the test data.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the perceptron trained?

A

The weights are adjusted based on the expected output and actual output, using gradient descent algorithm (dont need to know the details)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q
  • What are 3 benefits of using a perceptron over logisitic regression for binary classification?
A
  1. Perceptron is a simpler model
  • It uses linear classifier that directly classifies data points based on their feature value
  • Logistic regression conducts non-linear transformation on input data, making it more complex.
  1. Perceptron has faster computations
    - only requires dot product and comparison operation to classify point ;; LR requires complex calcs (logistic fn, gradients)

3.Perceptron is robust (resistant) to outliers
- It updates weights based on misclassified points;; while LR can be sensitive to outliers since it tries to minimise error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q
  • Give an example of a binary classification task that is not linearly separable. Draw out the decision boundary.
A

XOR logic gate

  • XOR, aka exclusive OR, gives true when number if positive outputs (class 1) is odd

See notes for decision boundary

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are 4 steps in feed forward process in neural networks? (in general for the whole network, not for the neuron)

A
  1. Get labelled training data
  2. Plug data into input layer
  3. Compute values for hidden layer using input layer and weights
  4. Compute values for output layer using hidden layer as input and weights
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the 4 steps in backward propogation?

A
  1. Initialise random values for weight in all layers

Repeat till convergence
- For each training data point:
2. Propogate inputs forward to compute outputs
3. Propogate the deltas backwards from output layer to input layer
4. Update every weight in network using deltas.

17
Q

Why is it not recommended to initialise weights to zeros in neural networks?

A
  • May cause problems during training
  • All neurons have the same output and during backpropogation, all neurons will receive the same error signal
  • Weight updates are symmetric, network gets stuck in a local minimum
18
Q

What is the exploding gradient problem in neural networks?

A
  • Gradients get larger and larger as backpropogation gets from output to input layer, resulting in large updates in weights

CAUSES
- high weights during initialisation
- high learning rate

19
Q

What is the vanishing gradient problem in neural networks? What are possible causes?

A
  • As backpropogation advances backward from output layer towards input layer, the gradients get smaller and smaller and approach zero
  • thus, there is little to no change of weights nearer to the input layer

Possible causes
- Activation functions: Some activation functions, such as the sigmoid function, have derivatives that become very small as the inputs move away from zero. This means that as the signal passes through many layers of the network, the gradients can become very small, making it difficult to update the weights.

  • Depth of the network: As the number of layers in a neural network increases, the gradients can become smaller and smaller as they propagate through the layers. This is because the gradients are multiplied together in each layer during backpropagation, and if they are small to begin with, they become even smaller as they pass through more layers.
20
Q

What are 2 consquences of having many layers in an ANN?

A
  1. Overfitting
  2. Higher computation –> expensive and takes more time