L6: Second Winter, Reasoning Under Uncertainty Flashcards
What were the issues with expert systems?
- Classical formal logic is not suited for all problems.
- Performance issues on large knowledge bases.
- Expert elicitation—translating expert’s knowledge to a computer—is very hard.
- Many domain experts don’t actually reason using “if – then –” rules.
- Despite good PR, some academic “milestones” were not commercially viable.
Dendral is one such example; widely lauded, but never made it out of the lab.
Why is there a need for uncertainty representation?
Sometimes problems are difficult to describe with full certainty.
May be due to, e.g.,
* Inherent randomness in outcomes;
* Unknowns/uncertainties in domain description;
* Existence of many special cases.
What are different methods for representing uncertainty?
- (Classical) probability theory
- Various extensions and generalisations of probability theory
* Belief functions (Dempster-Shafer theory);
* Interval probabilities;
* Credal sets. - Fuzzy sets and logics
- Non-monotonic logics
What is a probability? What is the Bayesian and frequentist approach
Quantifies uncertainty about occurrence of some event of interest
- Bayesian probability: use subjective degree of belief P(A).
- Frequentist probability: use the limiting frequency of observations ω1, ω2, . . .
P(A) = lim n→+∞ 1/n sum(indicator function on w)
when indicator function = 1 if w in A otherwise 0.
What is a sample space?
What is an event?
What do you call a singleton event?
What is an outcome?
A sample set is a non-empty set Ω (represents atomic things that can happen- outcomes)
Outcome w in Ω is a specific possible outcome (Singleton event = atomic event)
An event is a subset A of Ω (describes something that can happen)
What is a sigma algebra over Ω?
A sigma-algebra over Ω is a set Σ of events such that
1. Contains sample space: Ω ∈ Σ
2. Closed under complements: If A ∈ Σ then also (Ω \ A) ∈ Σ
3. Closed under countable unions: If A1, A2, . . . ∈ Σ then also S
i Ai ∈ Σ
THIS IS THE EVENT SPACE
A sigma-algebra is a collection of subsets of X including X and the empty set.
Σ captures the events about which a probability measure P can say something.
What is a measurable space?
The tuple (Ω, Σ) is called a measurable space. The sigma algebra define the subsets that will be measured.
What is a probability measure?
A probability measure P : Σ → R on this space satisfies
* P(A) ≥ 0 for all A ∈ Σ
* P(Ω) = 1
* P(∪iAi) = sum P(Ai) for all disjoint A1, A2, . . . ∈ Σ.
What is a probability space?
The triple (Ω, Σ, P) is called a probability space
A probability space consists of three elements:
- A sample space, Ω, which is the set of all possible outcomes.
- An event space Σ, which is a set of events, an event being a set of outcomes in the sample space.
- A probability function P, which assigns each event in the event space a probability, which is a number between 0 and 1.
What is a random variable?
Random variables are functions that map elements of a probability outcome space Ω into another
outcome space (for us often real valued)
Let (Ω, Σ, P) be a probability space and let (W , F) be a measurable space.
A random variable (RV) X is a map X : Ω → W
What does it mean for an RV to be measurable?
This means that for all A ∈ F that
X^−1(A) = {ω ∈ Ω : X(ω) ∈ A} ∈ Σ
That for all A we can find values in the original sample space that give our value when mapped, and this set corresponds to an event in our event space (so we can assign a probability!)
we need to be able to compute probabilities with random variables, and the only way to do this is to find the pre-image of sets and compute probabilities with them. This requires that the pre-image of (measurable) sets be measurable, which is the definition of a measurable function.
To be more precise, suppose (Ω1,F1,P) is your probability space, (Ω2,F2) is your “target” space (this is usually taken to be R with its Borel sets). Then, suppose we want X:Ω1→Ω2 to be a random variable. This means that we want to be able to measure the probability of events like {X∈A} where A∈F2. Well, the way we do this is to find X^-1(A) , and try to compute P(X-1(A)). We can only do this if X^-1(A)∈F1, in other words, if X
is a measurable function..
Basically that X-1 on A is in our original sigma algebra. So for example if curly F has the event ‘two heads’ in it, then the inverse of this event also needs to be in sigma so we can assign a probability to it
How does an RV induce a probability measure P_x?
P_x(A) = P(X^-1(A))
The probability of event A in the the new measurable space is the same as the probability of the ‘parent’ event in our original probability space
What are multivariate probabilities?
Suppose we want to talk about multiple variables at once.
Simply take a sample space X = X1 × · · · × Xn
and some sufficiently expressive sigma-algebra F over X.
We take the RV X~ : Ω → X and build the probability space (X, F, P_X ).
It will be useful to denote the corresponding PMF as p(X1, . . . , Xn) = P_X({x1,…,xn}) for all (x1,…,xn) subset of X
Whenever a collection of random variables are mentioned, they are ALWAYS assumed to be defined on the same sample space.
What is the equation for the conditional PMF of X1 and X2?
p(X1|X2) = p(X1,X2)/p(X2) if p(X2) > 0
In general to condition an and statement, just need to divide by prob without the left hand prob in condition
What is the chain rule for probabilities?
The chain rule for probabilities states that
p(X1, . . . , Xn) = Product of p(Xik| Xi1, . . . , Xik−1) for all n
using the convention that for k = 1, we let p(Xik| Xi1, . . . , Xik−1) = p(Xi1)