Lecture 4 Flashcards
Parameters
Variables learnt (found) during training e.g. weights
Hyperparamters
Variables whose value is set before the training process begins (alpha and k (from k-NN))
Loss Function (or error)
Should be minimized for a single training example to achieve the objective; Part of a cost function
Cost Function
average of your loss functions over the entire training set (mean square error); A type of objective function
Objective Function
Any function that you optimize during training (maximum likelihood, divergence between classes)
Gradient Descent
Finds optimal values for w;
Is an iterative optimization algorithm that operates over a loss landscape (cost function);
Follows the slope of the gradient W to reach the minimum cost
Gradient of Cost Function
The direction of the steps to achieve the goal
Learning Rate
The size of steps took in any direction
Sigmoid Function (Logistic Function)
Assumes a particular functional form (a sigmoid) is applied to the linear function of the data; the output is a smooth and differentiable function of the inputs and the weights
Cross Entropy (Logarithmic Loss)
Predicts class probability compared to actual class for an output of 0 or 1;
The score calculated penalizes probability based on how far it is from actual value;
Penalty is logarithmic in nature
Regularization
Any modification to a learning algorithm that is intended to reduce its generalization error but not its training error;
Solves overfitting;
Used when we don’t have enough samples to create a good logistic regression classification model.
Activation Functions
Applied on the hidden units;
achieve nonlinearity;
popular activation function
Feedforward
Calculates the predicted output (y); inference
Backpropogation
Updating the weights and biases; learning
Artificial Neural Networks
- Highly expressive non-linear functions
- Highly parallel network of logistic function units
- minimizes sum of square training errors plus weight squared (regularization)
- Uses gradient descent as training procedure
Is logistic regression used for classification or regression?
Logistic Regression is used for classification only.
Does regularization work better with a small set of features (low dimensions) or a large set of features (high dimensions)?
Regularization works well with a large number of features.
Is logistic regression used for binary or multiclass classification?
It can be used for both
What are the steps for a gradient decent algorithm?
- Initialize random weights and bias
- Pass an input through the network and compute predicted values from output layer
- Calculate error between the actual value and the predicted value
- Go to each weights which contributes to the error and change its respective values to reduce the error
- Reiterate until you find the best weights of network