Basic definitions Flashcards by David Yamin

What is an activation function?

An activation function defines the output of a node given a set of inputs. There are many types of activation functions. The two most common used in neural networks are the logistic (sigmoid) and the hyperbolic tangent.

How well did you know this?

Not at all

Perfectly

What are the two most common activation functions and why?

The two most common activation functions used in neural networks are logistic (Sigmoid) and hyperbolic tangent.

This is true for two reasons:
First, they introduce non-linearity to a NN. This is problem because most problems that a NN solves are non-linear – i.e., they cannot be solved by separating classes with a straight line.
Second, they limit the output of a node to a certain range. Logistic produces output between 0, 1. Hyperbolic tangent produces output between -1, 1

How well did you know this?

Not at all

Perfectly

What is a neural network?

Neural Networks are one type of learning algorithm that is used within machine learning. NNs are composed of a large number of highly interconnected processing elements (nodes) working in parallel to solve a specific problem. A key feature of a NN is that it learns by example.

How well did you know this?

Not at all

Perfectly

What is a node?

A NN consists of multiple layers and each of these layers consists of one or more nodes. These nodes are all connected and work together to solve a problem.

How well did you know this?

Not at all

Perfectly

What is an edge?

A neural network consists of multiple layers and each layer consists of one or more nodes. These nodes are all connected via edges, which mimic the synapse connections found within the human brain.

How well did you know this?

Not at all

Perfectly

Target Output

Neural networks are supervised learning algorithms, which means that the NN is provided with a training set. This training set provides targets that the NN aims to achieve. Technically speaking, the target is the desired output for the given input.

How well did you know this?

Not at all

Perfectly

What is total error or global error?

The NN is successfully trained once it has minimized (to an acceptable level) the difference between its real or actual output and its target output. This difference called total error or global error. The total error is typically calculated using a cost function, such as mean squared error or root mean square error.

How well did you know this?

Not at all

Perfectly

What is the relationship between local error and total or global error?

The total error is the sum of all the local errors. A local error is the difference that occurs between the actual output of a single node and the target output that was expected. For example:
Actual Output: .75
Target Output: 1.0
Error: 0.25

How well did you know this?

Not at all

Perfectly

What is net input?

Net input refers to the sum of all inputs into a hidden or output nodes. It is calculated by adding together the multiplication of each input by its respective weight. This calculation is usually performed using the summation operator.

How well did you know this?

Not at all

Perfectly

Is a NN an algorithm?

Yes. A NN is a machine learning algorithm insofar as it is a set of instructions designed to perform a specific task. Apart from this, algorithms are used with a NN to successfully train the network. One of the most common algorithms used to accomplish this is backpropagation, which makes use of gradient descent to optimize and ultimately train the network.

How well did you know this?

Not at all

Perfectly

What is machine learning?

Machine learning is the science of getting computers to act without being explicitly programmed.

How well did you know this?

Not at all

Perfectly

What is batch training or “full batch”?

Batch training is a particular form of gradient descent, which is used in conjunction with back propagation to train a network. Batch training works by summing the gradients for each training set element and then updating the weights in one iteration.

How well did you know this?

Not at all

Perfectly

What is stochastic gradient descent?

Stochastic gradient descent is another form of gradient descent, which is used in conjunction with back propagation to train a network. SGD works by updating weights for each training set element, not their sums. This means it utilizes many more iterations than batch training.

How well did you know this?

Not at all

Perfectly

What is mini-batch training?

Another form of gradient descent, which is used in conjunction with back propagation to train a network. Mini-Batch training works by summing the gradients for multiple training set elements (but not all of them) and then updating the weights. The size of the Mini-Batch can be pre-set as a hyperparameter or randomly chosen by the algorithm. To continue with our example above, these elements would be n number of images from a training set of 10,000 images. Mini-Batch is one of the more popular and successful gradient descent methods.

How well did you know this?

Not at all

Perfectly

What is back propagation?

Backpropagation is the process of the neural network working backwards to adjust weights proportionally.

How well did you know this?

Not at all

Perfectly

What is the chain rule?

Study These Flashcards

Here are two definitions:

The chain rule is a way to find a derivative of an equation that is a function found inside another function.
The chain rule is used to compute the derivative of the composition of two or more functions.

Within a NN, the chain rule is used to calculate the derivative across an entire NN. Practically speaking, this makes it possible to adjust the weights and hopefully succeed in training the NN.

What is classification?

Study These Flashcards

NNs excel at classification. Classification refers to sorting or classifying data into groups. For example, a neural network might be presented with a folder of various pictures: some are dogs, some are cats, some are pigs. In this instance the NN is required to learning how to classify these pictures into their three respective groups.

What is a convolutional neural network?

Study These Flashcards

A CNN is a specific type of feedforward NN. CNNs are typically used with image recognition and in many regards are the core of computer vision systems. They are also used in NLP.

What does “converge” mean?

Study These Flashcards

In ML, converging refers to the output moving closer and closer to the desired target value. Converge is the opposite of diverge.

What does “diverge” mean?

Study These Flashcards

The opposite of “converge” is “diverge”. This occurs when the output continues to oscillate (undergo fluctuations). In this circumstance, the output is not inching steadily toward the target, but is oscillating instead.

What is the “curse of dimensionality”?

Study These Flashcards

The curse of dimensionality is a ML phrase that refers to the difficulty of working in multiple dimensions.

What is dropout?

Study These Flashcards

Dropout is a form of regularization that helps a network generalize fittings and increase accuracy. It is often used with deep NNs to combat overfitting, which it accomplishes by occasionally switching off one or more nodes in the network.

A node that is switched off cannot have its weights updated, nor can it affect other nodes. This causes the other weights that are switched on to become more insensitive to the weights of other nodes and, eventually, begin to make better decisions on their own.

What is deep learning?

Study These Flashcards

Deep learning is a relatively new field within ML that refers to NNs with multiple hidden layers. The adjective “deep” refers to the amount of hidden layers that exist; the more layers, the deeper the learning.

What is an epoch?

Study These Flashcards

An epoch refers to one forward pass and one backward pass of ALL training examples in a neural network. In other words, an epoch describes the number of times a network sees an entire data set. For example, if you are training a network to recognize images and have a set of 10,000 images, these images are the training examples. An epoch will have occurred once all of the examples are passed through the network – “passing through the network” includes both a forward and backward pass through the network.

What is a training example?

For example, if you are training a network to recognize images, and you have 10,000 images, these images are the training examples.

What is a batch?

A batch refers to the total number of training examples in both a forward and backward pass. For example, 100 images.

What is an iteration?

An iteration refers to the number of times a batch of data passes through the network. For example, if you have 500 images (training examples) and your batch size is 100, then it will take 5 iterations to complete one epoch.

What is a gradient?

A gradient is the slope of the error function at a specific weight -- or in other words, it is the individual error of a single weight in a neural network. Technically, it is a vector and arrived at by calculating the derivative of the slope at a specific point.

Describe the ReLU activation function.

The Rectified Linear Unit (ReLU) has become very popular in the last few years. It computes the function ƒ(κ)=max (0,κ). In other words, the activation is simply threshold at zero. In comparison to sigmoid and tanh, ReLU is more reliable and accelerates the convergence by six times. Unfortunately, a con is that ReLU can be fragile during training. A large gradient flowing through it can update it in such a way that the neuron will never get further updated. However, we can work with this by setting a proper learning rate.

Describe the sigmoid activation function.

The sigmoid non-linearity has the mathematical form σ(κ) = 1/(1+e¯κ). It takes a real-valued number and “squashes” it into a range between 0 and 1. However, a very undesirable property of sigmoid is that when the activation is at either tail, the gradient becomes almost zero. If the local gradient becomes very small, then in backpropagation it will effectively “kill” the gradient. Also, if the data coming into the neuron is always positive, then the output of sigmoid will be either all positives or all negatives, resulting in a zig-zag dynamic of gradient updates for weight.

Describe the hyperbolic tangent (tanh) activation function.

Tanh squashes a real-valued number to the range [-1, 1]. Like sigmoid, the activation saturates, but — unlike the sigmoid neurons — its output is zero centered.

Basic definitions Flashcards

(31 cards)