Predictive Models: Deep Neural Networks Flashcards
What is a Deep Neural Network and what does it consist of?
Deep Neural Network is a model inspired by the brain.
It is a network of interconnected nodes, with input and output nodes, with hidden layers in between making computations.
What are Deep Neural Networks doing?
The DNN finds the correct mathematical manipulation to turn similar input into output based on its training, whether it be a linear relationship or a non-linear relationship.
What is a node/neuron, and how are they connected
A node is a computational unit.
The node has one or more weighted input connections from and to other nodes.
What does a node consist of?
A node consists of a sum function and an activation function.
What is the weight in a DNN?
What are they usually denoted or referred to as?
The weight is the impact on a node, by a previous node.
e.g. Imagine the previous node outputs 10.
The weight to the next node is 0.1. This means it will impact the next nodes sum function with 1
Weight is usually denoted as theta (θ)
Imagine a fully connected layer. (all nodes between layers are connected to each other).
There are 10 nodes in each layer.
How many connections do each node in the next hidden layer have?
All nodes have 10 connections from the previous layer.
There is one connection to each individual node, pr. node from the previous layer, a fully connected layer.
What is the sum function?
The sum function takes the weighted sum of all the connections from other nodes and, sums them up.
What is the activation function?
The activation function is a computation, that fires if the sum functions return a sum high enough (the threshold is met)
Different types of activation functions decide how much a node should fire given a threshold.
What is the threshold?
A threshold is a certain value, at which, the activation function will fire or how much it will fire.
What is the output layer?
The output layer consists of a or multiple nodes that contains the hypothesis function.
It produces a result from the inputs.
What is a bias node?
A bias node is a node that can be in a hidden layer. Imagine a node at the bottom of each layer which simply adds a number (bias) of, for example, 1, to the next layer. or in the end just before the output node.
Do the activation functions change between layers?
Yes, the activation functions can change between the layers.
What is feedforward in a Neural Network?
It is the simplest form of a Neural Network. It means that the learning only travels forward, and not in e.g. a cycle.
You could say, information only moves one way, forward.
What does it mean if a DNN is ‘shallow’?
That it has 1 or a low amount of hidden layers.
Explain the ‘step’ activation function
This activation function fires when the weighted sum is above a threshold. Then it sends the exact same value to the next layer (weight is necessarily not the same)
What is the problem with the step activation function?
Multiple nodes can take the value of 1 and fire. Then this could make it hard to classify/decide.
Which different activation functions are there?
- Sigmmoid
- Tanh
- ReLu
- Leaky ReLu
- binary step function
A solution to the problem of the step function is the linear function, but what is the problem with that?
If we use the linear step function we would get a range of values as output. e.g y=ax, this represents not only one but a range of values.
The problem is that if we only use the linear function in a DNN, it could cause the output layer to become a linear function. This makes it impossible to map non-linear data.
What is the Sigmoid activation function?
The Sigmoid activation function looks like a smooth step function. (S - shape)
The function works in such a way that between x -2 and 2 the y-value changes drastically, resulting in the tendency to results being brought to either end, almost like a binary result. Making it good for distinctions.
Another big advantage is that the output is always going to be between -1 and 1!
What are the disadvantages to the sigmoid activation function?
Because of the slope in the ends, it is hard to make significant changes at x-values farther from 0. The gradient is very flat, this makes it so that the algorithm starts to learn extremely slow with almost no progress.
What is the Tanh activation function?
This activation function is very similar to the sigmoid function. In fact, it is a scaled Sigmoid function.
The properties are the same, near 0,0 changes to x are significant for y. as we go to the sides the changes are minimal.
What is the ReLu activation function?
The ReLu function gives an output if x is positive, and gives 0 otherwise.
When x is negative it is flat on the x-axis (y=0) - the line is horizontal.
When x is positive it is linear.
Does the ReLu function have the same issues as a linear activation function?
No it can be stacked because it is not only linear. So combinations of ReLu is not linear
What is the benefit you gain from ReLu?
When a network of neurons are initialised with random values almost 50 % of the network will yield 0. Because the output is 0 for negative values.
This means fewer neurons are firing (sparse activation) making the network lighter.