lecture 3 - bayesian inference Flashcards
What is a common pitfall when using frequentist statistics?
Frequentist statistics often minimize error based on averages without accounting for the uncertainty around predictions.
How is data modeled in the frequentist framework?
- The full dataset is modeled as y(x,w)
- A datapoint is modeled as a distribution where the mean is given by y(x_0, w) for an input x_0
What does p(t∣x) represent in the frequentist framework?
- For each fixed x, p(t∣x) represents the probability of observing t, which is assumed to be normally distributed.
What is the definition of the expected loss function E[L] in machine learning?
the average loss across possible values of t, weighted by their probability, expressed as a double integral over x and t.
What does minimizing the expected loss function involve?
Taking the derivative with respect to the model’s prediction y(x) and setting it to zero.
What result do you get when taking the derivative of the expected loss function with respect to y(x)?
- 2 ∫ {y(x)-t} p(t|x) dt
- indicates that adjustments should be made to reduce the difference between the prediction y(x) and the target t.
How do you find the optimal prediction y(x) after taking the derivative of the expected loss function?
By setting the derivative to zero and solving, you find that the optimal prediction y(x) is the expected value of t given x, denoted as E[t∣x].
If you do not use a model, how can you estimate the best value of t for a given x?
By taking the expected value of t at that fixed value of x.
What is the purpose of adding zero in the form of −E[t∣x]+E[t∣x]=0 when decomposing the expected loss function?
The purpose is to separate the prediction y(x) from the true expected value E[t∣x], which serves as the best predictor of t given x.
What happens to the cross term 2(y(x)−E[t∣x])(E[t∣x]−t) when taking the expectation of the expanded loss function?
The cross term disappears because E[t∣x]−t has zero mean.
What are the two main components left after taking the expected value of the expanded loss function?
- (y(x)−E[t∣x]): Error of the model in approximating the true expected value.
- Var(t∣x): Inherent variance in the data, independent of the model (noise).
Why is variance related to
t always present in the model?
- Variance related to t is intrinsic noise in the random variable t and cannot be changed, meaning there will always be some noise-related error.
- Var(t∣x)
What are the components of the bias-variance decomposition of the expected loss?
- (Bias)^2: The squared difference between the expected model prediction and the true value.
- Variance: The variability of the model prediction.
- Noise: The inherent variability in the data that cannot be controlled by changing the model.
How can bias and variance be controlled in a model?
- The model bias and variance are related to eachother in that if you increase one, the other decreases (and vice versa)
- Bias can be reduced by increasing model complexity, but this may increase variance.
- Variance can be reduced by simplifying the model, but this may increase bias.
- In deep learning, it’s possible to reduce both bias and variance through techniques like regularization and large datasets.
What is the final form of the expected loss after decomposition?
Expectedloss = (Bias)^2 + Variance + Noise
How does regularization affect the bias-variance tradeoff in a model?
Regularization controls the tradeoff between bias and variance, influencing the model’s performance by adjusting flexibility.
What happens when the regularization parameter λ is high?
- Little model variance, meaning minimal difference in the estimates produced by different models (low variance).
- High bias, as the model is too rigid and far from the true function.
What are the effects of low regularization (λ is low)?
- High variance because the model has more flexibility to fit the data.
- Low bias, as the model better approximates the true function.
What happens when the model is over-regularized (extremely low λ)?
The model exhibits even more variability (high variance), allowing it to be highly flexible, but with very little bias.
What happens to bias as regularization decreases?
Bias decreases as regularization decreases because the model becomes more flexible and can better fit the data.
What happens to variance as regularization decreases?
Variance increases as regularization decreases because the model becomes more complex and can overfit the data.
What is the relationship between test error, bias, and variance?
The test error behaves similarly to the combined bias squared and variance curve, but at a slightly higher error value.
What is the goal when tuning the regularization parameter?
The goal is to find a balance where the sum of bias and variance is minimized, leading to the lowest test error.
How does model complexity affect bias and variance?
Model complexity increases variance but decreases bias, as more complex models can fit the data more closely.
How does regularization affect bias and variance?
Regularization decreases variance but increases bias, as it simplifies the model and reduces its flexibility.
What happens when you increase the number of coefficients or basis functions in a model (e.g., linear regression or neural networks)?
Increasing the number of coefficients or basis functions adds complexity, which increases variance but decreases bias.
In the context of bias-variance tradeoff, what does the red line in the plot represent?
The red line represents a simple model (e.g., a flat line) that underfits the data, resulting in high bias.
In the context of bias-variance tradeoff, what does the blue line in the plot represent?
The blue line represents a very complex model that overfits the data, resulting in high variance.
What is the “sweet spot” of model complexity?
The sweet spot is a level of complexity where the bias and variance are balanced, minimizing the overall error, which can be achieved through proper regularization.
Why is the bias-variance tradeoff valuable in machine learning?
It provides useful intuition for model selection and regularization by explaining the tradeoff between underfitting and overfitting.
Why is the bias-variance tradeoff of limited practical value?
- In theory, the bias-variance tradeoff relies on analyzing an ensemble of datasets to compute the exact bias and variance.
- In real-world scenarios, we typically work with only a single dataset, not an ensemble.
- Because bias and variance cannot be precisely calculated from a single dataset in practice, making it difficult to directly measure or apply the tradeoff.
What is Bayesian linear regression?
Bayesian linear regression is a type of linear regression that incorporates Bayesian principles, allowing you to quantify uncertainty in predictions by providing a distribution over possible parameters (weights).