Activation Functions Flashcards

1
Q

What are the advantages of linear activation functions?

A

You can have multiple outputs rather than just 0 and 1 which are porportionate to the input.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Whats a disadvantage of linear activation functions?

A

You cannot use gradient descent as the gradient of a linear function is a constant irrespective of input. Also combinations of linear functions are also linear, meaning you essentially collapse networks into single nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the advantages of the sigmoid function ?

1/(1 + e^-x)

A
  • Unlike step sigmoid has smooth curve meaning probabilities of things being in a class can be determined as well.
  • Outputs are normalised between zero and 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the disadvantages of the sigmoid function?

1/(1 + e^-x)

A
  • Vanishing gradient problem, For large or small inputs the results will not change that much causing slow learning.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the advantages of tanh function?

(e^x - e^-x ) / (e^x + e^-x )

A
  • Also S shaped allowing continuous results.
  • range between 1 and -1
  • Tanh’s gradient can reach a max of 1, whereas sigmoid can only reach 0.25
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the disadvantages of the tanh function?

(e^x - e^-x ) / (e^x + e^-x )

A
  • Vanishing gradient
  • computationally expensive due to exponentials
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

How are tanh and sigmoid releated?

A

tanh(x) = 2 * sigmoid(x) - 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

what is the derivative of sigmoid?

A

sig(x)(1 - sig(x))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the derivative of tanh?

A

1- tanh(x)^2

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What are the advantages of the relu function?

A
  • Simple and easy to compute causing fast convergence
  • No vanishing gradient problem
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What are the disadvantages of the relu function?

A
  • negative inputs always result in 0, this can cause the relu neuron to become inactive known as dying relu.
  • Not a zero centred function meaning no derivative at zero exists.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is leaky relu?

A

Like relu but makes negative inputs smaller rather than 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the advantages of Leaky Relu?

A
  • easy to compute
  • no vanishing gradient
  • no dying relu
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the disadvantages of Leaky Relu?

A

not zero centred.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the advantages of ELU?

A
  • smoother than leaky relu at x=0
  • no dying relu
  • no vanishing gradient
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the disadvantages of ELU?

A

more expensive to compute than leaky relu.

17
Q

What is softmax?

A

takes an array of numbers, and normalises them between 0 and 1 based on their exponential values. Used for multi class classification.