F7 Intro to Bayesian data analysis Flashcards
What are the three steps in a Bayesian approach to forecasting?
1) Use fundamentals to predict the popular vote
2) Add in new data from polling
3) Update prior using Markov Chain Monte Carlo to a posterior prediction
What is Markov Chain Monte Carlo?
MCMC.
Explores thousands of different values for each parameter in our model, and evaluates both how well they explain the patterns in the data and how plausible they are given the expectations from our prior
What are two considerations for the prior?
1) Avoid overfitting - parsimonious model are rewarded.
2) Leave-one-out cross-validation: Training models on data from some elections from the data and testing their performance on others.
What can be said about the 10,000 simulations from the model?
Hypothetical paths the election could take (Trump, Harris or tied).
The more likely a scenario, the more often it will appear.
Some of them involve large nationwide, regional, or demographic polling errors benefiting one party or another.
How is the probability of Harris winning calculated?
Simply the fraction of simulations where Harris win.
What is the Bayesian logic (short)? Draw it
We update our prior believe (fundamentals) with new data (polls distributed by covariance matrix) resulting in a posterior distribution of outcomes sample from a posterior distribution (calculate probability of Harris winning).
What are two fundamental components of Bayesian data analysis?
We reallocate credibility across all possible outcomes (Harris, Trump or tied).
The possible outcomes over which we allocate credibility are parameter values in meaningful mathematical models
What does ‘Data is noisy’ mean?
Data have a probabilistic rather than deterministic relation to their underlying cause.
What are five key steps in a Bayesian analysis?
- Identify data
- Define descriptive model (mathematical form and its parameters)
- Specify a prior distribution on the parameters
- Use Bayesian inference to reallocate credibility across parameter values
- Evaluate
What is parameter values?
Control knobs on mathematical formulas that determine shape of the distribution e.g. location and scale.
What is the sample space for the election?
Three outcomes that are mutually exclusive:
Harris
Trump
Tied
What is the difference between frequentist and Bayesian statistics?
Frequentist: Empirical and objective. With greater sample size, the result convereges to the ‘true’ underlying value.
Bayesian: There is only one election and it can’t be repeated. Bayesian is more about subjective believes about the likelihood of an event occurring.
What is Bayes rule
Posterior = Likelihood * prior / evidence.
Probability of harris winning = number of simulations where Harris win given the likelihood of polls + prior / number of simulations
P (A|B) = P(B|A)*P(A) / P(B)
What are two key take aways from the relation between prior uncertainty and number of polls?
1) The more certain the prior, the less impact from polls
2) The more polls, the less weight of prior