Basic definitions Flashcards
What is an activation function?
An activation function defines the output of a node given a set of inputs. There are many types of activation functions. The two most common used in neural networks are the logistic (sigmoid) and the hyperbolic tangent.
What are the two most common activation functions and why?
The two most common activation functions used in neural networks are logistic (Sigmoid) and hyperbolic tangent.
This is true for two reasons:
First, they introduce non-linearity to a NN. This is problem because most problems that a NN solves are non-linear – i.e., they cannot be solved by separating classes with a straight line.
Second, they limit the output of a node to a certain range. Logistic produces output between 0, 1. Hyperbolic tangent produces output between -1, 1
What is a neural network?
Neural Networks are one type of learning algorithm that is used within machine learning. NNs are composed of a large number of highly interconnected processing elements (nodes) working in parallel to solve a specific problem. A key feature of a NN is that it learns by example.
What is a node?
A NN consists of multiple layers and each of these layers consists of one or more nodes. These nodes are all connected and work together to solve a problem.
What is an edge?
A neural network consists of multiple layers and each layer consists of one or more nodes. These nodes are all connected via edges, which mimic the synapse connections found within the human brain.
Target Output
Neural networks are supervised learning algorithms, which means that the NN is provided with a training set. This training set provides targets that the NN aims to achieve. Technically speaking, the target is the desired output for the given input.
What is total error or global error?
The NN is successfully trained once it has minimized (to an acceptable level) the difference between its real or actual output and its target output. This difference called total error or global error. The total error is typically calculated using a cost function, such as mean squared error or root mean square error.
What is the relationship between local error and total or global error?
The total error is the sum of all the local errors. A local error is the difference that occurs between the actual output of a single node and the target output that was expected. For example:
Actual Output: .75
Target Output: 1.0
Error: 0.25
What is net input?
Net input refers to the sum of all inputs into a hidden or output nodes. It is calculated by adding together the multiplication of each input by its respective weight. This calculation is usually performed using the summation operator.
Is a NN an algorithm?
Yes. A NN is a machine learning algorithm insofar as it is a set of instructions designed to perform a specific task. Apart from this, algorithms are used with a NN to successfully train the network. One of the most common algorithms used to accomplish this is backpropagation, which makes use of gradient descent to optimize and ultimately train the network.
What is machine learning?
Machine learning is the science of getting computers to act without being explicitly programmed.
What is batch training or “full batch”?
Batch training is a particular form of gradient descent, which is used in conjunction with back propagation to train a network. Batch training works by summing the gradients for each training set element and then updating the weights in one iteration.
What is stochastic gradient descent?
Stochastic gradient descent is another form of gradient descent, which is used in conjunction with back propagation to train a network. SGD works by updating weights for each training set element, not their sums. This means it utilizes many more iterations than batch training.
What is mini-batch training?
Another form of gradient descent, which is used in conjunction with back propagation to train a network. Mini-Batch training works by summing the gradients for multiple training set elements (but not all of them) and then updating the weights. The size of the Mini-Batch can be pre-set as a hyperparameter or randomly chosen by the algorithm. To continue with our example above, these elements would be n number of images from a training set of 10,000 images. Mini-Batch is one of the more popular and successful gradient descent methods.
What is back propagation?
Backpropagation is the process of the neural network working backwards to adjust weights proportionally.