Lecture 4 Flashcards
Parameters
Variables learnt (found) during training e.g. weights
Hyperparamters
Variables whose value is set before the training process begins (alpha and k (from k-NN))
Loss Function (or error)
Should be minimized for a single training example to achieve the objective; Part of a cost function
Cost Function
average of your loss functions over the entire training set (mean square error); A type of objective function
Objective Function
Any function that you optimize during training (maximum likelihood, divergence between classes)
Gradient Descent
Finds optimal values for w;
Is an iterative optimization algorithm that operates over a loss landscape (cost function);
Follows the slope of the gradient W to reach the minimum cost
Gradient of Cost Function
The direction of the steps to achieve the goal
Learning Rate
The size of steps took in any direction
Sigmoid Function (Logistic Function)
Assumes a particular functional form (a sigmoid) is applied to the linear function of the data; the output is a smooth and differentiable function of the inputs and the weights
Cross Entropy (Logarithmic Loss)
Predicts class probability compared to actual class for an output of 0 or 1;
The score calculated penalizes probability based on how far it is from actual value;
Penalty is logarithmic in nature
Regularization
Any modification to a learning algorithm that is intended to reduce its generalization error but not its training error;
Solves overfitting;
Used when we don’t have enough samples to create a good logistic regression classification model.
Activation Functions
Applied on the hidden units;
achieve nonlinearity;
popular activation function
Feedforward
Calculates the predicted output (y); inference
Backpropogation
Updating the weights and biases; learning
Artificial Neural Networks
- Highly expressive non-linear functions
- Highly parallel network of logistic function units
- minimizes sum of square training errors plus weight squared (regularization)
- Uses gradient descent as training procedure