week 6 - intro to deep learning Flashcards

Question 1

Q

what are differences between a brain and neural networks?

Answer

A

The brain is much more computationally efficient, consumes much less energy

brain neurons have a greater level of non-linearity than neurons in MLP’s. This means they are better able to represent higher dimensions of the data

Brains are typically more modular (meaning they contain specialist regions for distinct functions) and less simplified than neural networks

Question 2

Q

what does increasing the number of parameters in neural nets do?

Answer

A

increasing emergent capabilities of the model as number of parameters increase

As parameters increase, the model is able to solve new types of tasks

Question 3

Q

what is the neural perception

Answer

A

the most basic unit of neural networks

signal comes in, and if a certain threshold is passed signal comes out, output is 1; if a certain threshold is not passed, output is 0

the threshold can be represented by a step function

Question 4

Q

what is the XOR problem?

Answer

A

say you have two inputs that could either be 1 or 0

using a linear model, you cannot create a function that outputs TRUE only if the two inputs are different

Question 5

Q

How does the MLP solve the XOR problem?

Answer

A

if you aggregate the output of two neurons, and put it as input into other neurons in another layer, you can solve the XOR problem

These extra neurons in the hidden layer contain non-linear transformations that transform the space, to allow a linear line to pass between the points

The MLP introduces a hidden layer with neurons that apply non-linear transformations. This layer maps the original input into a higher-dimensional space where the problem becomes linearly separable.

Question 6

Q

How is the MLP a universal function approximator?

Answer

A

Any function can be approximated with a wide enough hidden layer

Question 7

Q

how does a model adjust the weights by itself

Answer

A

It calculates the loss function
It minimises the loss using gradient descent

Question 8

Q

what is the loss function for regression tasks and classification tasks?

Answer

A

regression tasks: L1 = mean absolute error, L2 = mean squared error

classification tasks: Cross entropy (the difference between the true and the predicted probability distributions)

Question 9

Q

what are pros and cons of L1 and L2 loss

Answer

A

L1 - penalizes all errors linearly
- robust to outliers because large errors are not exaggerated

L2 - penalizes large errors more because of squaring
- Less robust to outliers as large errors dominate the loss

Question 10

Q

why do we use the particular loss functions as described above

Answer

A

because they are differentiable, meaning they allow for gradient descent

Question 11

Q

what is the derivative

Answer

A

for every specific point, there is a variation which is the differential (slope) of the function

Question 12

Q

what is the problem with using a step-function as the activation function

Answer

A

if we want a model to learn by itself the step function can’t be used because you can’t calculate a derivative from a step function

Instead we use a sigmoid or a Relu

Question 13

Q

why do we prefer relu over the sigmoid function

Answer

A

instead of a lower value being 0 and an upper 1,
relu doesn’t have an upper value. This means that once the input reaches a specific threshold, the output becomes a linear representation
this allows you to give more info on the output. Instead of the output being 0 or 1, the output is 0 or any positive value up to infinity

Question 14

Q

what is the main essence of backpropogation

Answer

A

if your trying to adjust a weight that is very far from your output, that means that it is further removed from the loss function

backpropogation is a way to measure and adjust the weights of specific neurons, independently of how far back we go

it shows the impact of each neuron on the loss function, by just looking at the previous step and the next step

Question 15

Q

why are neural networks useful for data that is not linearly seperable?

Answer

A

The multiple layers of neural networks allow data to be seperated non linearly

Even if the seperation is very complex, with enough layers the neural network can approximate it, as a neural network can act as a universal function approximator. It does this by combining many activation functions with different weights and biases.

Essentially, the weights and biases transform the activation functions, allowing them to, when combined, approximate very complex functions.

Question 16

Q

What are parameters to decide on when you build a neural network?

Answer

Study These Flashcards

A

What activation function
How many hidden layers
How many nodes in each hidden layer

Question 17

Q

What are the main ideas of backpropogation?

Answer

Study These Flashcards

A

Use the chain rule to calculate derivatives of the loss function (SSR), with respect to each parameter value
Operate gradient descent on the derivatives to optimise parameters

3.

Question 18

Q

What is a loss function in back propogation?

Answer

Study These Flashcards

A

The sum of the squared residuals

Question 19

Q

How do you use the chain rule to find the derivatives of backpropogation?

Answer

Study These Flashcards

A

`

week 6 - intro to deep learning Flashcards

(19 cards)