CHAPTER THREE Bayesian Decision Theory Flashcards
What is deductive reasoning?
A logical process where if the premises are accepted, the conclusion must also be accepted.
Provide an example of a valid but factually incorrect syllogism.
Plants are good for you; tobacco is a plant; ergo tobacco is good for you.
Who is credited with the idea of deductive reasoning?
Aristotle.
What are the two strong syllogisms described by E. T. Jaynes?
- If A is true, then B is true.
- A is true.
- Therefore B is true.
- If A is true, then B is true.
- B is false.
- Therefore A is false.
What does A ∨ B represent in Boolean algebra?
Both A and B are true (conjunction).
What does A ∧ B mean in Boolean algebra?
At least one of A and B is true (disjunction).
What is the implication represented by A → B?
If A is true, then B is true.
What is the output of an AND gate?
The output is true if both inputs are true.
What does an OR gate do?
It outputs true if at least one of its inputs is true.
What is a NOT gate’s function?
It outputs true if it is not receiving some input.
What are NAND gates capable of?
They can be used to create all other types of gates.
What is the limitation of classical logic?
It deals only with absolute truths (ones and zeros).
What does Bayesian reasoning allow us to do?
It helps us deal with probabilities and uncertainties.
What is Bayes’ theorem used for?
It provides a mathematical framework for updating beliefs based on new evidence.
What is the difference between frequentist and Bayesian approaches in decision theory?
Decision theorists cannot use frequentist math.
In the lottery example, what is the prior probability of winning with one ticket?
1 in 131,115,985.
What happens when the box beeps while testing a lottery number?
You must consider the likelihood ratio and the prior probability to update your belief.
What is the likelihood ratio in the beeping box example?
4:1.
What is the new posterior probability after the box beeps?
4:131,115,985.
What does Bayesian reasoning rely on?
It combines prior information with new data to form a revised understanding.
Fill in the blank: Bayesian logic allows us to deal with ____ in reasoning.
shades of gray.
True or False: Bayesian reasoning is only applicable in scientific contexts.
False.
What is the probability mass of now?
1/32,778,996
This represents a very small chance in a probabilistic context.
How many wrong combinations does the box beep for on average?
8,194,749 wrong combinations
This indicates the challenge of identifying the correct combination amidst many false positives.
How many times must you run a ticket through the box for it to be likely the right one?
Fourteen times
This emphasizes the improbability of identifying the correct ticket easily.
What concept in thermodynamics is compared to Bayes’ theorem?
Carnot engine
The Carnot engine represents the most efficient theoretical model for heat engines.
What does Bayes’ theorem help approximate in decision-making?
It helps approximate Bayesian decision theory
Decisions under uncertainty are better when they approximate Bayes’ theorem.
What does E. T. Jaynes argue about Bayes’ theorem?
It allows for plausible reasoning beyond Aristotelian logic
Jaynes emphasizes that using probabilities can extend logical reasoning.
What is an example of a logical conjunction represented in Bayesian terms?
p(A ∧ B) = p(C)
This shows how the probability of two events occurring together can be expressed in Bayesian logic.
What does the likelihood ratio indicate in the context of wet pavements?
It tells how much to update beliefs about rain based on evidence
The likelihood ratio quantifies how much more plausible a hypothesis becomes given new evidence.
What is the prior probability of rain at a certain time of year in the example?
33 percent
This serves as the baseline probability before considering additional evidence.
What is the posterior probability of it having rained given wet pavements?
66 percent
This result derives from applying Bayes’ theorem with prior probabilities and likelihoods.
What is Cromwell’s rule in Bayesian decision theory?
Never assign probabilities of one or zero to anything except logical truths
This rule encourages keeping an open mind to possibilities.
What happens to posterior probability if the prior is set to zero?
The posterior probability remains zero
This illustrates the issue of being too certain about initial beliefs.
How are odds calculated from probabilities?
By dividing the probability by 1 minus the probability
This transformation highlights the relationship between probabilities and odds.
What is the odds representation of a probability of 0.9?
9:1
This indicates a strong likelihood in favor of the event occurring.
What is a critical distinction between probabilities of 1 and 0.999999?
1 equals infinity to 1
This distinction shows the mathematical implications of assigning absolute certainty.
Why should you avoid assigning a probability of one?
Because it implies absolute certainty, which is unrealistic
Real-world events often have uncertainties that should be accounted for.
What is the implication of assigning a probability of zero or one?
You should never assign anything a probability of zero or one.
This means acknowledging that while some events are extremely unlikely, they are not impossible.
What does a very small probability indicate?
Very, very small probabilities are very, very small and should not be confused with impossibility.
For example, a one-in-a-quadrillion chance exists but is extremely unlikely.
What is the conservation of expected evidence?
You can’t go looking for new evidence to support your theory; the absence of expected evidence counts as evidence against your hypothesis.
This is a principle in Bayesian decision theory.
How does the absence of expected evidence affect belief?
If you expect to see evidence and do not find it, your belief should shift significantly in the opposite direction.
For example, if you expect to see evidence of wrongdoing and do not, your belief in the wrongdoing decreases.
What is the relationship between expected evidence and posterior probability?
The more strongly you expect something, the less your posterior probability changes when you find it.
Conversely, unexpected evidence causes a more significant shift in belief.
Fill in the blank: The absence of evidence is, in fact, _______.
evidence of absence.
This principle suggests that not finding evidence for a belief can strengthen the belief that it does not exist.
What is utility in decision theory?
Utility describes how much you care about something in decision-making under uncertainty.
It is often treated as equivalent to money for simplicity in calculations.
What does expected value combine?
Expected value combines probability and utility.
This helps in making decisions based on the anticipated outcomes of those decisions.
How is expected value calculated using a lottery example?
Expected value is calculated by dividing the value of the jackpot by the chance of winning it.
For example, if a lottery ticket costs £1 and has a jackpot of £150 million with a 1 in 131,115,985 chance, the expected value is positive.
What is a Dutch book in betting theory?
A Dutch book occurs when a person’s beliefs about probabilities lead to guaranteed losses regardless of the outcome.
It demonstrates irrationality in betting based on inconsistent probability assessments.
What did John von Neumann contribute to decision theory?
John von Neumann developed game theory and sought a normative way to make decisions under uncertainty to maximize expected well-being.
His work laid the foundation for understanding complex decision-making in economics.
What challenge arises when trying to maximize group utility?
Conflicts of interest between individuals complicate the process of maximizing group utility.
Different individuals may prioritize different desires, making it difficult to achieve a consensus on utility maximization.
Fill in the blank: Classical economics assumes that while you can rank people’s preferences, you cannot _______.
compare them.
This is particularly true when preferences conflict between individuals.
What is a key axiom proposed by von Neumann regarding people’s desires?
People’s desires need to be transitive.
Define transitive preferences in the context of decision-making.
If a person prefers A to B and B to C, then they must prefer A to C.
What happens if preferences are intransitive?
The individual can become a money pump.
What are the three necessary conditions for preferences according to von Neumann?
- Transitive
- Continuous
- Monotonic
What does it mean for preferences to be continuous?
There are no sudden jumps in people’s preferences as outcomes change.
What does monotonicity imply about decision-making?
A decision with a 50% chance of £10 should be indifferent to a decision with a 100% chance of £5.
What is meant by substitutability in preferences?
If indifferent between cake and jelly, one shouldn’t care about the probabilities of receiving each.
What is the utility theorem proposed by von Neumann?
People have preferences that can be assigned a numerical value, called ‘utils’.
What is the concept of ‘utils’?
A unit of measure for preferences in von Neumann’s utility theorem.
How did von Neumann apply his theories to the scenario of Holmes and Moriarty?
He analyzed their decision-making strategies using expected utilities.
What is expected utility?
The average utility of possible outcomes weighted by their probabilities.
What should Moriarty do to maximize his expected utility?
Be unpredictable in his actions.
What is a key challenge in decision-making under uncertainty?
Knowing the expected utility of any decision due to lack of information.
What is Occam’s razor?
A principle stating that simpler explanations are preferred over complex ones.
Who is Occam’s razor named after?
William of Ockham.
What is minimum message length in decision theory?
The shortest computer program that describes a given output.
What does Kolmogorov complexity refer to?
The complexity of an object in terms of the length of the shortest description.
What example illustrates the concept of minimum message length?
Three eleven-digit strings of numbers, where one is predictable and two are not.
What is the significance of randomness in the context of algorithm complexity?
Truly random sequences require longer descriptions than predictable ones.
What trade-off is discussed regarding hypothesis selection?
Between the complexity of the algorithm and the confidence in predicting the output.
How is information defined in the context of decision-making?
A single ‘bit’ of information can halve the probability space.
What is the role of priors in Bayesianism?
They are initial probabilities that influence the outcome of Bayesian inference.
True or False: Complexity in hypotheses should always be minimized regardless of fit.
False.
Fill in the blank: The __________ is the simplest explanation in a set of hypotheses.
[Occam’s razor]
What does assigning p = 0.02 to the remaining doors signify?
It indicates that the search space has been halved, increasing the probability mass on each remaining option.
What is the trade-off mentioned between complexity and good fit?
If an extra bit of information doesn’t allow you to halve the search space, it’s not worth it.
How should one choose between two hypotheses according to the text?
Assign higher prior probability to the simpler hypothesis to write as a computer program.
What is the relationship between modern AI systems and Bayesian principles?
Modern AI systems operate in a manner consistent with Bayesian decision-making under uncertainty.
What is a hyperprior?
It’s uncertainty about what parameter to use, reflecting a higher-level prediction about the shape of the world.
How does the example of a Bayesian AI playing hide-and-seek illustrate hyperpriors?
It demonstrates adapting prior probabilities based on changing evidence, like the opponent’s hiding behavior.
What is the significance of having multiple hypotheses in evaluating claims?
It complicates the evaluation because different priors can lead to different conclusions despite the same evidence.
What example is given to illustrate the skepticism of psychic claims?
The Mysterious Barry guessing numbers correctly, where even many correct guesses may not convince skeptics.
What were the results of Samuel Soal’s card-guessing experiment?
Two subjects scored significantly better than chance, raising questions about psychic powers.
What is the effect of alternative hypotheses on belief in psychic powers?
Alternative explanations, like fraud or design flaws, can overshadow belief in psychic phenomena.
How does prior probability influence the interpretation of new evidence?
Individuals with different priors may interpret the same evidence in opposing ways, reinforcing their beliefs.
What is the core function of artificial intelligence as described?
To predict uncertain outcomes, fundamentally based on Bayesian reasoning.
What is supervised learning in AI?
It’s a model that predicts labels for new data based on labeled training data.
How does an AI update its probability after seeing a picture?
It shifts from a prior probability to a posterior probability based on new information.
What statistical concept is used to find the line of best fit in AI?
Linear regression.
Fill in the blank: A simple AI might predict the likelihood of an image being a lion with a prior probability of _______.
p ≈ 0.33
True or False: Bayesian machine learning explicitly mimics Bayes’ rule.
True
What happens when evidence suggests a hypothesis is false?
Individuals might conclude the source of the evidence is untrustworthy rather than changing their belief.
What is the relationship between shoe size and height as observed in a random sample?
On average, taller people have larger feet.
What is the line of least squares used for?
To draw a line through data points that minimizes the sum of squared errors.
How is the error calculated in the line of least squares?
By measuring the vertical distance from the line to each dot and squaring that distance.
What does the sum of squared error represent?
The total error across all data points.
What is the purpose of the AI in the context of predicting height from shoe size?
To estimate a person’s height based on their shoe size using the line of least squares.
What factors influence the AI’s confidence in its predictions?
The amount of training data and the variance in that data.
What is the Bayesian process mentioned in the context of AI?
It involves updating prior beliefs with new data to form a posterior distribution.
What type of curves might be used instead of a straight line for data fitting?
Curved lines, such as exponential curves, S-shaped curves, or sine waves.
What does it mean for a model to be ‘underfitting’?
When the model is too simple to accurately capture the underlying data.
What is ‘overfitting’ in the context of AI models?
When a model fits the training data too closely and fails to generalize to new data.
What are hyperparameters in AI modeling?
Parameters that control the model’s capacity to fit the data, such as the degree of freedom in curve fitting.
What is meant by hyperpriors?
The AI’s prior beliefs about hyperparameters.
What is the primary function of AIs like ChatGPT?
To predict what a human would say or draw in response to a prompt.
How does ChatGPT generate responses?
By predicting the next word in a sequence based on prior text.
What does the term ‘stochastic parrots’ refer to?
A term used to describe language models that predict sequences without genuine understanding.
What is the significance of building a model of the world in AI?
It helps the AI make better predictions about future data.
What was the focus of the study involving Othello-GPT?
To determine whether the AI built an internal representation of the game rather than just memorizing statistics.
What does ‘probing’ refer to in AI research?
A technique to examine the internal states of the AI and its decision-making process.
What conclusion was drawn about LLMs and their ability to predict?
They likely build internal representations that assist in making predictions.
True or False: Large language models like ChatGPT understand the world in the same way humans do.
False.
Fill in the blank: The process of predicting the next token in a sequence is an inherently ______ process.
Bayesian.
What does it imply if an AI can make original, legal moves in a game it hasn’t seen before?
It suggests the AI has formed a model of the game instead of merely memorizing moves.