Predictive Models: Deep Neural Networks Flashcards

1
Q

What is a Deep Neural Network and what does it consist of?

A

Deep Neural Network is a model inspired by the brain.

It is a network of interconnected nodes, with input and output nodes, with hidden layers in between making computations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are Deep Neural Networks doing?

A

The DNN finds the correct mathematical manipulation to turn similar input into output based on its training, whether it be a linear relationship or a non-linear relationship.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is a node/neuron, and how are they connected

A

A node is a computational unit.

The node has one or more weighted input connections from and to other nodes.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a node consist of?

A

A node consists of a sum function and an activation function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the weight in a DNN?

What are they usually denoted or referred to as?

A

The weight is the impact on a node, by a previous node.

e.g. Imagine the previous node outputs 10.
The weight to the next node is 0.1. This means it will impact the next nodes sum function with 1

Weight is usually denoted as theta (θ)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Imagine a fully connected layer. (all nodes between layers are connected to each other).

There are 10 nodes in each layer.

How many connections do each node in the next hidden layer have?

A

All nodes have 10 connections from the previous layer.

There is one connection to each individual node, pr. node from the previous layer, a fully connected layer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the sum function?

A

The sum function takes the weighted sum of all the connections from other nodes and, sums them up.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the activation function?

A

The activation function is a computation, that fires if the sum functions return a sum high enough (the threshold is met)

Different types of activation functions decide how much a node should fire given a threshold.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the threshold?

A

A threshold is a certain value, at which, the activation function will fire or how much it will fire.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the output layer?

A

The output layer consists of a or multiple nodes that contains the hypothesis function.

It produces a result from the inputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is a bias node?

A

A bias node is a node that can be in a hidden layer. Imagine a node at the bottom of each layer which simply adds a number (bias) of, for example, 1, to the next layer. or in the end just before the output node.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Do the activation functions change between layers?

A

Yes, the activation functions can change between the layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is feedforward in a Neural Network?

A

It is the simplest form of a Neural Network. It means that the learning only travels forward, and not in e.g. a cycle.

You could say, information only moves one way, forward.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What does it mean if a DNN is ‘shallow’?

A

That it has 1 or a low amount of hidden layers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Explain the ‘step’ activation function

A

This activation function fires when the weighted sum is above a threshold. Then it sends the exact same value to the next layer (weight is necessarily not the same)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the problem with the step activation function?

A

Multiple nodes can take the value of 1 and fire. Then this could make it hard to classify/decide.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Which different activation functions are there?

A
  1. Sigmmoid
  2. Tanh
  3. ReLu
  4. Leaky ReLu
  5. binary step function
18
Q

A solution to the problem of the step function is the linear function, but what is the problem with that?

A

If we use the linear step function we would get a range of values as output. e.g y=ax, this represents not only one but a range of values.

The problem is that if we only use the linear function in a DNN, it could cause the output layer to become a linear function. This makes it impossible to map non-linear data.

19
Q

What is the Sigmoid activation function?

A

The Sigmoid activation function looks like a smooth step function. (S - shape)

The function works in such a way that between x -2 and 2 the y-value changes drastically, resulting in the tendency to results being brought to either end, almost like a binary result. Making it good for distinctions.

Another big advantage is that the output is always going to be between -1 and 1!

20
Q

What are the disadvantages to the sigmoid activation function?

A

Because of the slope in the ends, it is hard to make significant changes at x-values farther from 0. The gradient is very flat, this makes it so that the algorithm starts to learn extremely slow with almost no progress.

21
Q

What is the Tanh activation function?

A

This activation function is very similar to the sigmoid function. In fact, it is a scaled Sigmoid function.

The properties are the same, near 0,0 changes to x are significant for y. as we go to the sides the changes are minimal.

22
Q

What is the ReLu activation function?

A

The ReLu function gives an output if x is positive, and gives 0 otherwise.

When x is negative it is flat on the x-axis (y=0) - the line is horizontal.

When x is positive it is linear.

23
Q

Does the ReLu function have the same issues as a linear activation function?

A

No it can be stacked because it is not only linear. So combinations of ReLu is not linear

24
Q

What is the benefit you gain from ReLu?

A

When a network of neurons are initialised with random values almost 50 % of the network will yield 0. Because the output is 0 for negative values.

This means fewer neurons are firing (sparse activation) making the network lighter.

25
Q

What is the Dying ReLu Problem?

A

Because the gradient can go towards 0 for activations, the weights will not get adjusted during descent.

This means that the neurons that go into that state will stop responding to variations in error/input. Making a part of the network passive.

26
Q

What is the fix to the Dying ReLu Problem?

A

Leaky ReLu:
Make the horizontal line (where x is minus), non-horizontal.

The idea is that this makes the gradient non-zero so it can recover during training.

27
Q

Which is less computationally expensive? ReLu or tanh/sigmoid

A

ReLu because it involves simpler mathematical operations.

28
Q

Which activation function would you use for classification?

A

sigmoid or tanh

29
Q

What is an input node?

A

An input node does not compute, it only holds a value.

30
Q

What is a hidden layer?

A

It is a layer of nodes that does the computation in DNN.

31
Q

What is a bias node?

A

A bias node is a node that simply adds or subtracts a certain amount.

32
Q

What can the bias node do?

A

It can push the activation function to the left or right side of the graph.

Multiplying now makes the graph steeper
Adding/subtract shifts the function left/right

33
Q

When you train a model with a training set what are you essentially doing?

A

You change the weights (theta) so that the computations will result in an output that fits the training data.

34
Q

What is ‘Cost function’?

Also called ‘Loss function’

A

The cost function finds the difference between the predicted value and the real (labelled) value.

35
Q

What is the cost function for linear regression models?

A

RMSE: Root-Mean-Square-Error

36
Q

Which types of classification algorithms have we heard of?

A

Decision trees
Random Forests
K-Nearest Neighbor
Logistic Regression

37
Q

Which types of clustering algorithms have we heard of?

A

K-Means

K-Modes

38
Q

What is the decision boundary?

A

It is the threshold (y) where the value will be categorized as 1 or 0. So positive or negative.

39
Q

What is overfitting?

A

It is when the adjustments are too high for each iteration causing the model to oscillate over the optimal line

40
Q

What is backproporgation?

A

When you want to change the weights in your neural network to better predict.

Gradient descent optimizes the backpropagation.

41
Q

What does regularization do?

A

Regularization prevents overfitting but making small changes to the cost function algorithm. The goal is to make it perform better on unseen data.