week 6 - intro to deep learning Flashcards

1
Q

what are differences between a brain and neural networks?

A

The brain is much more computationally efficient, consumes much less energy

brain neurons have a greater level of non-linearity than neurons in MLP’s. This means they are better able to represent higher dimensions of the data

Brains are typically more modular (meaning they contain specialist regions for distinct functions) and less simplified than neural networks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

what does increasing the number of parameters in neural nets do?

A

increasing emergent capabilities of the model as number of parameters increase

As parameters increase, the model is able to solve new types of tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

what is the neural perception

A

the most basic unit of neural networks

signal comes in, and if a certain threshold is passed signal comes out, output is 1; if a certain threshold is not passed, output is 0

the threshold can be represented by a step function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

what is the XOR problem?

A

say you have two inputs that could either be 1 or 0

using a linear model, you cannot create a function that outputs TRUE only if the two inputs are different

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How does the MLP solve the XOR problem?

A

if you aggregate the output of two neurons, and put it as input into other neurons in another layer, you can solve the XOR problem

These extra neurons in the hidden layer contain non-linear transformations that transform the space, to allow a linear line to pass between the points

The MLP introduces a hidden layer with neurons that apply non-linear transformations. This layer maps the original input into a higher-dimensional space where the problem becomes linearly separable.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How is the MLP a universal function approximator?

A

Any function can be approximated with a wide enough hidden layer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

how does a model adjust the weights by itself

A

It calculates the loss function
It minimises the loss using gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the loss function for regression tasks and classification tasks?

A

regression tasks: L1 = mean absolute error, L2 = mean squared error

classification tasks: Cross entropy (the difference between the true and the predicted probability distributions)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

what are pros and cons of L1 and L2 loss

A

L1 - penalizes all errors linearly
- robust to outliers because large errors are not exaggerated

L2 - penalizes large errors more because of squaring
- Less robust to outliers as large errors dominate the loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

why do we use the particular loss functions as described above

A

because they are differentiable, meaning they allow for gradient descent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

what is the derivative

A

for every specific point, there is a variation which is the differential (slope) of the function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

what is the problem with using a step-function as the activation function

A

if we want a model to learn by itself the step function can’t be used because you can’t calculate a derivative from a step function

Instead we use a sigmoid or a Relu

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

why do we prefer relu over the sigmoid function

A

instead of a lower value being 0 and an upper 1,
relu doesn’t have an upper value. This means that once the input reaches a specific threshold, the output becomes a linear representation
this allows you to give more info on the output. Instead of the output being 0 or 1, the output is 0 or any positive value up to infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

what is the main essence of backpropogation

A

if your trying to adjust a weight that is very far from your output, that means that it is further removed from the loss function

backpropogation is a way to measure and adjust the weights of specific neurons, independently of how far back we go

it shows the impact of each neuron on the loss function, by just looking at the previous step and the next step

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

why are neural networks useful for data that is not linearly seperable?

A

The multiple layers of neural networks allow data to be seperated non linearly

Even if the seperation is very complex, with enough layers the neural network can approximate it, as a neural network can act as a universal function approximator. It does this by combining many activation functions with different weights and biases.

Essentially, the weights and biases transform the activation functions, allowing them to, when combined, approximate very complex functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are parameters to decide on when you build a neural network?

A

What activation function
How many hidden layers
How many nodes in each hidden layer

17
Q

What are the main ideas of backpropogation?

A
  1. Use the chain rule to calculate derivatives of the loss function (SSR), with respect to each parameter value
  2. Operate gradient descent on the derivatives to optimise parameters

3.

18
Q

What is a loss function in back propogation?

A

The sum of the squared residuals

19
Q

How do you use the chain rule to find the derivatives of backpropogation?

A

`