Week 5 Flashcards
What is Likelihood?
Probability of observing data given a particular model.
What are the likelihood functions for discrete/continuous for Xis?
Where vector X is a sample from a distribution with parameter theta:
Discrete = PX(x;theta)
Continuous = fX(theta | x)
What is a Maximum Likelihood Estimate?
A MLE of theta is a value of theta that maximises the likelihood function.
What is a Maximum Likelihood Estimate?
A MLE of theta is a value of theta that maximises the likelihood function.
What are two possible options to find the maximum likelihood?
- ) Search - Exhaustive(low dimensional) or Grid.
2. ) Optimization Algorithms
What is a Cost Function?
Maps a set of events into a number that represents the “cost” of the event occuring.
Also know as the loss or objective function.
What is the cost function for likelihood, and why is it used?
J(theta, D) = -log(L(theta, D))
Convention: many optimization problems are minimization.
Convenience.
Numerically Stable: Product of theta will quickly converge to zero.
What are optmization problems and their procedure?
Finding the best solution for the feasible ones.
- Construct a model.
- Determine the problem type.
- Select algorithm.
What is the difference between supervised and unsupervised machine learning?
Supervised: Given some training data, want to train a model to explain some data.
Unsupervised: Given some unlabelled samples, want to divide into multiple groups.
What is Gradient Descent?
A first-order iterative algorithm for finding a local minimum of a differentiable cost function.
Employ negative gradient at each step to decrease cost function.
Two ingredients - direction and magnitude (step size).
What is Classification?
Determining the most likely class that an input pattern belongs to.
What is Logistic Regression?
Regression model where dependent variable is categorical.
Goal is to predict the probability that a given example belongs to “1” class versus the probability it belongs to the “0” class.
Also known as logit regression.
How does Logistic Regression work?
Use logarithm of the odds to model the binary prediction as a linear combination of independent variables.
Then use logistic function to convert log-odds to probability.