Predictive Models: Deep Neural Networks Flashcards by Phillip Eismark

What is a Deep Neural Network and what does it consist of?

Deep Neural Network is a model inspired by the brain.

It is a network of interconnected nodes, with input and output nodes, with hidden layers in between making computations.

How well did you know this?

Not at all

Perfectly

What are Deep Neural Networks doing?

The DNN finds the correct mathematical manipulation to turn similar input into output based on its training, whether it be a linear relationship or a non-linear relationship.

How well did you know this?

Not at all

Perfectly

What is a node/neuron, and how are they connected

A node is a computational unit.

The node has one or more weighted input connections from and to other nodes.

How well did you know this?

Not at all

Perfectly

What does a node consist of?

A node consists of a sum function and an activation function.

How well did you know this?

Not at all

Perfectly

What is the weight in a DNN?

What are they usually denoted or referred to as?

The weight is the impact on a node, by a previous node.

e.g. Imagine the previous node outputs 10.
The weight to the next node is 0.1. This means it will impact the next nodes sum function with 1

Weight is usually denoted as theta (θ)

How well did you know this?

Not at all

Perfectly

Imagine a fully connected layer. (all nodes between layers are connected to each other).

There are 10 nodes in each layer.

How many connections do each node in the next hidden layer have?

All nodes have 10 connections from the previous layer.

There is one connection to each individual node, pr. node from the previous layer, a fully connected layer.

How well did you know this?

Not at all

Perfectly

What is the sum function?

The sum function takes the weighted sum of all the connections from other nodes and, sums them up.

How well did you know this?

Not at all

Perfectly

What is the activation function?

The activation function is a computation, that fires if the sum functions return a sum high enough (the threshold is met)

Different types of activation functions decide how much a node should fire given a threshold.

How well did you know this?

Not at all

Perfectly

What is the threshold?

A threshold is a certain value, at which, the activation function will fire or how much it will fire.

How well did you know this?

Not at all

Perfectly

What is the output layer?

The output layer consists of a or multiple nodes that contains the hypothesis function.

It produces a result from the inputs.

How well did you know this?

Not at all

Perfectly

What is a bias node?

A bias node is a node that can be in a hidden layer. Imagine a node at the bottom of each layer which simply adds a number (bias) of, for example, 1, to the next layer. or in the end just before the output node.

How well did you know this?

Not at all

Perfectly

Do the activation functions change between layers?

Yes, the activation functions can change between the layers.

How well did you know this?

Not at all

Perfectly

What is feedforward in a Neural Network?

It is the simplest form of a Neural Network. It means that the learning only travels forward, and not in e.g. a cycle.

You could say, information only moves one way, forward.

How well did you know this?

Not at all

Perfectly

What does it mean if a DNN is ‘shallow’?

That it has 1 or a low amount of hidden layers.

How well did you know this?

Not at all

Perfectly

Explain the ‘step’ activation function

This activation function fires when the weighted sum is above a threshold. Then it sends the exact same value to the next layer (weight is necessarily not the same)

How well did you know this?

Not at all

Perfectly

What is the problem with the step activation function?

Multiple nodes can take the value of 1 and fire. Then this could make it hard to classify/decide.

How well did you know this?

Not at all

Perfectly

Which different activation functions are there?

Study These Flashcards

Sigmmoid
Tanh
ReLu
Leaky ReLu
binary step function

A solution to the problem of the step function is the linear function, but what is the problem with that?

Study These Flashcards

If we use the linear step function we would get a range of values as output. e.g y=ax, this represents not only one but a range of values.

The problem is that if we only use the linear function in a DNN, it could cause the output layer to become a linear function. This makes it impossible to map non-linear data.

What is the Sigmoid activation function?

Study These Flashcards

The Sigmoid activation function looks like a smooth step function. (S - shape)

The function works in such a way that between x -2 and 2 the y-value changes drastically, resulting in the tendency to results being brought to either end, almost like a binary result. Making it good for distinctions.

Another big advantage is that the output is always going to be between -1 and 1!

What are the disadvantages to the sigmoid activation function?

Study These Flashcards

Because of the slope in the ends, it is hard to make significant changes at x-values farther from 0. The gradient is very flat, this makes it so that the algorithm starts to learn extremely slow with almost no progress.

What is the Tanh activation function?

Study These Flashcards

This activation function is very similar to the sigmoid function. In fact, it is a scaled Sigmoid function.

The properties are the same, near 0,0 changes to x are significant for y. as we go to the sides the changes are minimal.

What is the ReLu activation function?

Study These Flashcards

The ReLu function gives an output if x is positive, and gives 0 otherwise.

When x is negative it is flat on the x-axis (y=0) - the line is horizontal.

When x is positive it is linear.

Does the ReLu function have the same issues as a linear activation function?

Study These Flashcards

No it can be stacked because it is not only linear. So combinations of ReLu is not linear

What is the benefit you gain from ReLu?

Study These Flashcards

When a network of neurons are initialised with random values almost 50 % of the network will yield 0. Because the output is 0 for negative values.

This means fewer neurons are firing (sparse activation) making the network lighter.

What is the Dying ReLu Problem?

Because the gradient can go towards 0 for activations, the weights will not get adjusted during descent. This means that the neurons that go into that state will stop responding to variations in error/input. Making a part of the network passive.

What is the fix to the Dying ReLu Problem?

Leaky ReLu: Make the horizontal line (where x is minus), non-horizontal. The idea is that this makes the gradient non-zero so it can recover during training.

Which is less computationally expensive? ReLu or tanh/sigmoid

ReLu because it involves simpler mathematical operations.

Which activation function would you use for classification?

sigmoid or tanh

What is an input node?

An input node does not compute, it only holds a value.

What is a hidden layer?

It is a layer of nodes that does the computation in DNN.

What is a bias node?

A bias node is a node that simply adds or subtracts a certain amount.

What can the bias node do?

It can push the activation function to the left or right side of the graph. Multiplying now makes the graph steeper Adding/subtract shifts the function left/right

When you train a model with a training set what are you essentially doing?

You change the weights (theta) so that the computations will result in an output that fits the training data.

What is 'Cost function'? | Also called 'Loss function'

The cost function finds the difference between the predicted value and the real (labelled) value.

What is the cost function for linear regression models?

RMSE: Root-Mean-Square-Error

Which types of classification algorithms have we heard of?

Decision trees Random Forests K-Nearest Neighbor Logistic Regression

Which types of clustering algorithms have we heard of?

K-Means | K-Modes

What is the decision boundary?

It is the threshold (y) where the value will be categorized as 1 or 0. So positive or negative.

What is overfitting?

It is when the adjustments are too high for each iteration causing the model to oscillate over the optimal line

What is backproporgation?

When you want to change the weights in your neural network to better predict. Gradient descent optimizes the backpropagation.

What does regularization do?

Regularization prevents overfitting but making small changes to the cost function algorithm. The goal is to make it perform better on unseen data.

Predictive Models: Deep Neural Networks Flashcards

(41 cards)