Neural Network Basic Ingredienrs Flashcards
Activation functions are typically what type of functions?
Non linear functions
List three example activation functions
Sigmoid, ReLU, tanh
The softmax function is a generalization of what?
The logistic function, but to multiple dimensions
the softmax function is also known as what?
Softargmax or normalized exponential function
Softmax does what?
It converts a vector z of K real numbers (e.g either positive or negative(and they don’t all have to be the same sign (eg - or +))) into a vector of K probabilities s (aka a probability distribution), where the mapping between z and s is order preserving and where all the elements in s sum to one .
What is forward propagation?
The process of a neural network generating an output for a given input
Explain the specificity of the term back propagation
Backpropagation is actually only the process of calculating the gradient of the loss function with respect to the weights, iteratively using the chain rule. Formally speaking, It doesn’t refer to how the gradient is used. However, loosely speaking it is often used loosely to refer to the entire learning algorithm, including how the gradient is used (such as by stochastic gradient descent)
Stochastic gradient descent
Rprop
Resilient backpropagation
Batch
In gradient descent, a batch is the total number of examples [by examples they mean training samples, right?] you use to calculate the gradient in a single iteration.
Batch size for stochastic gradient descent
1
Convex problems
have only one minimum; that is, only one place where the slope (of the loss function (where the independent variable is the weight(s)) is exactly 0
Ln(1)
0
Cross entropy
A measure for calculating how good a classifier is. More specifically when trying to classify the class of some item, we declare a p(x) to be the true probability distribution for if some item is class x in a discrete set of x ∈ X. This p(x) is usually expressed as a one hot vector <0 0 0 0 1.0 0 0> if we say the class set is of size 7. Our model is going to predict some distribution q(x)
What does the graph of a logistic function look like?
What are the domain and range of the sigmoid function?
Domain: -infinity to infinity
Range : 0 to 1
Why is argsoftmax considered the generalization of the logistic function to multiple dimensions?
Logistic function takes a scalar of any real value and converts it into another scalar between the range of 0 and 1. Argsoftmax takes a vector (of d dimensions) of any real values and converts it into another vector (of d dimensions) whose values are all in the range of [0,1]
When is the softmax function used?
is used as the activation function in the output layer of neural networks that are performing multi-class classification problems
What is the formula for the Jacobian of the softmax function?
Categorical cross entropy loss is also called what?
Softmax loss
Classification problems can be subdivided into what two categories?
Multi-class classification and multi-label classification
Explain the difference between multi-class and multi-label classification
multi-class classification - each sample belongs to only one class (mutually exclusive)
multi-label classification - each sample may belong to multiple classes (or to no class)