10 Probabilistic Reasoning Flashcards
Status of probability sentences
Statement: “The patient has cavity with probability 0.8
In logic, a sentence is true or false, depending on the interpretation and the world. In probability theory, the probability assigned to a sentence depends on the evidence so far.
• Prior (unconditional) probability: Before any evidence
• Posterior (conditional) probability: After some evidence
Probability is more like entailment than truth!
Probability, utility and decisions
The agent can use probability theory to reason about uncertainty. The agent can use utility theory for rational selection of actions based on preferences. Decision theory is a general theory for combining probability with rational decisions.
Decision theory = Probability theory + Utility theory
Basic probability notation
A probability model is a set of propositions, expressed in terms of random variables with domains:
• Boolean - E.g. Cavity : < true, f alse >
• Discrete - E.g. Weather : < sunny, rainy, cloudy, snow >
• Continuous - E.g. Index : [0, 1]
An atomic event is an assignment of particular values to all variables of the domain
• E.g. Cavity = false ^ Toothache = true
• Mutually exclusive (only one event can be true at a time)
• Exhaustive (at least one must be true)
Prior (unconditional) probability of a proposition:
P(A) P(Cavity) = 0.1 …i.e. P(Cavity = true) = 0.1
Probability distribution of variable P(v)
P(Weather) = (0.7, 0.2, 0.08, 0.02)
Joint probability distribution
• Table of probabilities for all combinations: P(v1,v2)
• P (Weather, Cavity) is a 4 x 2 table of probabilities (must sum to 1)
• Full joint distribution: all domain variables included
Conditional (posterior) probability: P(A|B)
P(Cavity | Toothache) = 0.8
Product rule:
P(A ^ B) = P(A|B)P(B)
P(A ^ B) = P(B|A)P(A)
P(A | B) = P(A ^ B) / P(B)
Problem with interference using full joint distribution
This approach does not scale up! Table size and calculation time is O(2n) for n Boolean variables.
Combining evidence
We can use Bayes’ rule to find probability of a state given several pieces of evidence
P(Cavity|Toothache ^ Catch) = P(Toothache ^ Catch|Cavity)P(Cavity)/P(Toothache ^ Catch)
For this to work we need conditional probability for all combinations of evidence variables. In general case there is an exponential number of conditional probabilities. For n Boolean evidence variables, we need 2^n conditional probabilities. This led AI researchers away from probability theory and towards more ad hoc systems.
Conditional independence
Using Bayes’ rule is simplified in situations of conditional independence between variables. Combining evidence does not need information for all combinations of evidence variables, only separate conditionals
Bayesian networks
Bayesian networks
Bayesian networks represent dependencies between variables and give concise specification of joint probability distribution. A Bayesian network is a graph where:
- A set of random variables are the nodes of the graph
- A set of directed links connects pairs of nodes
- A link from X to Y means that X has direct influence on Y, or is the parent of Y
- Each node has a conditional probability table (CPT) that quantifies effects parent nodes have on the node
- The graph has no directed cycles (it is a directed, acyclic graph, or DAG)
It can be shown that a Bayesian network is a representation of the joint probability distribution.
Incremental construction of Bayesian network
General procedure
• Choose set of relevant domain variables Xi
• Choose an ordering of the variables, preferably using causal domain knowledge (“root causes first”, etc.)
• While there are variables left
- Pick the next variable Xi and add node for i
- Set Parents(Xi) to minimal set of nodes already in net such that Xi depends directly only on these nodes
- Define conditional probability table for Xi
• Guarantees that
- Network is acyclic
- Axioms of probability are satisfied
Bayesian inference
Probabilistic inference procedure computes posterior probability distribution for a set of query variables, given exact values for some evidence variables
P (Query|Evidence)
An agent gets values for evidence variables from its percepts and queries for other variables in order to decide on action, using two functions. Exact and approximate (Monte Carlo) methods have been developed for Bayesian inference.
Other uses of Bayesian networks
Make decisions based on probabilities in the network and agent utilities. Decide which additional evidence variables should be observed in order to gain useful information. Perform sensitivity analysis to understand which aspects of model have greatest impact on probabilities of query variables. Explain results of probabilistic inference to users.
Other approaches to uncertainty
- Default reasoning
- Rule-based methods
- Dempster-Shafer theory
- Fuzzy sets and fuzzy logic
Default reasoning
Human judgmental reasoning is “qualitative” and does not rely on numerical probabilities. Default reasoning:
• Assume default state until new evidence presents itself
• If new evidence, conclusion may be retracted
This kind of reasoning is called nonmonotonic, because set of beliefs may both grow and shrink. Problems:
• Unclear semantic status of default rules
• What if two matching default rules have contradictory conclusions • Managing retraction of beliefs (truth maintenance systems)
• How to use default rules for making decisions
No default reasoning system has solved all issues, and most systems are formally undecidable, and very slow in practice
Rule-based methods
Logical and rule-based reasoning systems have useful properties that probabilistic systems lack.
• Locality: Can use A => B independent of other rules
• Detachment: Can use belief independent of its justification
• Truth-functionality: Truth of complex sentence follows from truth of components
Attempts have been made to modify rule systems by attaching degrees of belief to propositions/rules. Best known is the certainty factor model (Mycin, ca. 1980). The problem is that above properties are not appropriate for uncertain reasoning, and rule-based approaches to uncertainty have fallen out of use.
Dempster-Shafer theory
Dempster-Shafer theory deals with the distinction between uncertainty and ignorance. Instead of computing probability P(X), it computes probability Bel(X) that evidence supports proposition. Example:
For unknown possibly non-fair coin, Bel(Heads) = Bel(¬Heads) = 0
If 90% certain that coin is fair (P(Heads) = 0.5),
Bel(Heads) = 0.5x0.9 = 0.45, Bel(¬Heads) = 0.45
One interpretation of Dempster-Shafer is that it calculates a probability interval.
• Heads interval for unknown possibly non-fair coin: [0, 1]
• Interval for 90% certain that coin is fair: [0.45, 0.55]
Width of the interval helps decide when more evidence is required
Fuzzy sets and fuzzy logic
Fuzzy set theory is a means of specifying how well an object satisfies a vague description. E.g. Is “Nate is tall” true if Nate is 5’ 10”?
In fuzzy set theory, TallPerson is a fuzzy predicate and the truth value of TallPerson(X) is in [0,1]. The fuzzy predicate defines a fuzzy set that does not have sharp boundaries, i.e. not uncertainty about the world, but uncertainty about the meaning of “tall”, and many claim that fuzzy set theory is not for uncertain reasoning at all.
Fuzzy logic can be used to determine truth value of complex sentence from truth of its components. However, fuzzy logic is inconsistent with first-order logic. Despite this, fuzzy logic has been commercially successful, e.g. in automatic transmissions, trains, and video cameras.