lecture 4 Flashcards
What is the focus of probabilistic models in machine learning?
Using probability theory to select models for given data.
What is conditional probability?
p(X | Y) is the probability of X given Y.
What is Bayes’ theorem?
p(X | Y) = p(Y | X) * p(X) / p(Y)
What is a useful probability rule?
p(X, Y) = p(X | Y) * p(Y)
What is (Conditional) Independence?
When knowing Y provides no information about X.
What is expectation in probability?
The weighted average of all possible values of a random variable.
What are common probability distributions in ML?
Bernoulli, Categorical, and Normal distributions.
What does p(Data | θ) represent?
The likelihood of observing data given parameters θ.
What is the goal in probabilistic learning?
To infer the parameters θ given only the observed data.
What is the frequentist view of probability?
The true model exists but is unknown, and we estimate its parameters.
What is maximum likelihood estimation (MLE)?
Finding parameters that maximize the likelihood of observed data.
What is the likelihood function L(θ)?
p(X | θ), the probability of observed data given parameters.
What is the MLE (maximum likelihood estimation) formula?
θ̂ = argmax_θ p(X | θ)
What is a simple analogy for probabilistic learning?
A ‘machine’ generates data, and we infer its parameters.
What does the model space represent?
All possible models that could explain the observed data.
What does maximum likelihood estimation (MLE) assume?
That the best model is the one that makes the observed data most likely.
What is a practical example of MLE?
Estimating if a coin is fair or biased based on observed flips.
What is the difference between frequentist and Bayesian inference?
Frequentists estimate a fixed parameter, while Bayesians use probability distributions over parameters.
What is the Bayesian approach to learning?
Updating beliefs about parameters using observed data and prior knowledge.
What is a prior in Bayesian inference?
The initial belief about a parameter before observing data.
What is a posterior distribution?
The updated belief about a parameter after observing data.
What is the likelihood in Bayesian inference?
The probability of observed data given a parameter.
What is the Bayes rule formula for updating beliefs?
Posterior ∝ Likelihood × Prior
What is a conjugate prior?
A prior that, when updated, results in a posterior with the same functional form.
What is the benefit of using conjugate priors?
They simplify Bayesian inference calculations.
What is MAP estimation?
Maximum a posteriori estimation, which finds the most probable parameter given both data and prior.
What is the difference between MAP and MLE?
MAP considers prior knowledge, while MLE relies only on data.
What is the purpose of probabilistic modeling in ML?
To incorporate uncertainty and make robust predictions.
What is the relationship between likelihood and probability?
Likelihood measures how well parameters explain data, while probability represents chance.
What is the key takeaway from probabilistic models?
They provide a framework for handling uncertainty in machine learning.