L6: Second Winter, Reasoning Under Uncertainty Flashcards

1
Q

What were the issues with expert systems?

A
  • Classical formal logic is not suited for all problems.
  • Performance issues on large knowledge bases.
  • Expert elicitation—translating expert’s knowledge to a computer—is very hard.
  • Many domain experts don’t actually reason using “if – then –” rules.
  • Despite good PR, some academic “milestones” were not commercially viable.
    Dendral is one such example; widely lauded, but never made it out of the lab.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Why is there a need for uncertainty representation?

A

Sometimes problems are difficult to describe with full certainty.
May be due to, e.g.,
* Inherent randomness in outcomes;
* Unknowns/uncertainties in domain description;
* Existence of many special cases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are different methods for representing uncertainty?

A
  • (Classical) probability theory
  • Various extensions and generalisations of probability theory
    * Belief functions (Dempster-Shafer theory);
    * Interval probabilities;
    * Credal sets.
  • Fuzzy sets and logics
  • Non-monotonic logics
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is a probability? What is the Bayesian and frequentist approach

A

Quantifies uncertainty about occurrence of some event of interest

  • Bayesian probability: use subjective degree of belief P(A).
  • Frequentist probability: use the limiting frequency of observations ω1, ω2, . . .
    P(A) = lim n→+∞ 1/n sum(indicator function on w)
    when indicator function = 1 if w in A otherwise 0.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a sample space?
What is an event?
What do you call a singleton event?
What is an outcome?

A

A sample set is a non-empty set Ω (represents atomic things that can happen- outcomes)

Outcome w in Ω is a specific possible outcome (Singleton event = atomic event)

An event is a subset A of Ω (describes something that can happen)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a sigma algebra over Ω?

A

A sigma-algebra over Ω is a set Σ of events such that
1. Contains sample space: Ω ∈ Σ
2. Closed under complements: If A ∈ Σ then also (Ω \ A) ∈ Σ
3. Closed under countable unions: If A1, A2, . . . ∈ Σ then also S
i Ai ∈ Σ

THIS IS THE EVENT SPACE

A sigma-algebra is a collection of subsets of X including X and the empty set.

Σ captures the events about which a probability measure P can say something.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is a measurable space?

A

The tuple (Ω, Σ) is called a measurable space. The sigma algebra define the subsets that will be measured.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a probability measure?

A

A probability measure P : Σ → R on this space satisfies
* P(A) ≥ 0 for all A ∈ Σ
* P(Ω) = 1
* P(∪iAi) = sum P(Ai) for all disjoint A1, A2, . . . ∈ Σ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is a probability space?

A

The triple (Ω, Σ, P) is called a probability space

A probability space consists of three elements:

  • A sample space, Ω, which is the set of all possible outcomes.
  • An event space Σ, which is a set of events, an event being a set of outcomes in the sample space.
  • A probability function P, which assigns each event in the event space a probability, which is a number between 0 and 1.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a random variable?

A

Random variables are functions that map elements of a probability outcome space Ω into another
outcome space (for us often real valued)

Let (Ω, Σ, P) be a probability space and let (W , F) be a measurable space.
A random variable (RV) X is a map X : Ω → W

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does it mean for an RV to be measurable?

A

This means that for all A ∈ F that
X^−1(A) = {ω ∈ Ω : X(ω) ∈ A} ∈ Σ

That for all A we can find values in the original sample space that give our value when mapped, and this set corresponds to an event in our event space (so we can assign a probability!)

we need to be able to compute probabilities with random variables, and the only way to do this is to find the pre-image of sets and compute probabilities with them. This requires that the pre-image of (measurable) sets be measurable, which is the definition of a measurable function.

To be more precise, suppose (Ω1,F1,P) is your probability space, (Ω2,F2) is your “target” space (this is usually taken to be R with its Borel sets). Then, suppose we want X:Ω1→Ω2 to be a random variable. This means that we want to be able to measure the probability of events like {X∈A} where A∈F2. Well, the way we do this is to find X^-1(A) , and try to compute P(X-1(A)). We can only do this if X^-1(A)∈F1, in other words, if X
is a measurable function..

Basically that X-1 on A is in our original sigma algebra. So for example if curly F has the event ‘two heads’ in it, then the inverse of this event also needs to be in sigma so we can assign a probability to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How does an RV induce a probability measure P_x?

A

P_x(A) = P(X^-1(A))

The probability of event A in the the new measurable space is the same as the probability of the ‘parent’ event in our original probability space

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are multivariate probabilities?

A

Suppose we want to talk about multiple variables at once.
Simply take a sample space X = X1 × · · · × Xn
and some sufficiently expressive sigma-algebra F over X.
We take the RV X~ : Ω → X and build the probability space (X, F, P_X ).

It will be useful to denote the corresponding PMF as p(X1, . . . , Xn) = P_X({x1,…,xn}) for all (x1,…,xn) subset of X

Whenever a collection of random variables are mentioned, they are ALWAYS assumed to be defined on the same sample space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the equation for the conditional PMF of X1 and X2?

A

p(X1|X2) = p(X1,X2)/p(X2) if p(X2) > 0

In general to condition an and statement, just need to divide by prob without the left hand prob in condition

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the chain rule for probabilities?

A

The chain rule for probabilities states that
p(X1, . . . , Xn) = Product of p(Xik| Xi1, . . . , Xik−1) for all n

using the convention that for k = 1, we let p(Xik| Xi1, . . . , Xik−1) = p(Xi1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is probabilistic independence?

A

Probabilistic independence between variables X1, X2 iff their joint PMF factorises:
p(X1, X2) = p(X1)p(X2)

17
Q

What is conditional independence?

A

Conditional independence between variables X1, X2 given X3 iff
p(X1, X2 | X3) = p(X1 | X3)p(X2 | X3)
We then write X1 ⊥ X2 | X3. (note if independent from X4 too can write X1 ⊥ X2, X4 | X3)

It follows that
X1 ⊥ X2 | X3 ⇒ p(X1 | X2, X3) = p(X1 | X3).

18
Q

If we are specifying a model over multiple RVs X1,..X3 how many numerical parameters do we need? Assume binary

A

Takes 2^n-1 numerical parameters

If we have three X variables then we have 2^3 (i.e. 8) possible combinations of 0s and 1s, need to know prob of 7 of them (last can calculate by taking rest away from 1)

19
Q

How many parameters do we need if we assume all RVs are independent?

A

n (sum of prob of 0 and 1 of each variable will equal 1)

20
Q

What are Bayesian Networks?

A

Bayesian networks (BNs) were introduced by Judea Pearl in 1985.
* BNs are probabilistic graphical models.
* Encode conditional independencies between variables

Bayesian network encodes a PMF of variables X1, . . . , Xn.
Core of the representation is an acyclic directed graph (DAG) G = (V, E).
* The vertices are given by V = {X1, . . . , Xn}.
That is, the vertices of the graph correspond to the random variables in the model.
* Edges E indicate, roughly, “influence”/ dependency

Graph can often be specified by expert knowledge because presence of edges only indicates dependence between variables

BNs can be efficiently encoded in computer

21
Q

What are conditional probability tables?

A

The collection of parents of Xi
in G is denoted pa(Xi)

Definition
A family of conditional probability tables (CPTs) for a given DAG G = (V, E)
is a map P(· | ·) that gives for all Xi ∈ V a conditional PMF P(Xi| pa(Xi)).

See tables on slide 62

CPTs=Either determined from expert knowledge or estimated from data.

21
Q

What is the complexity of specific inferences in BN such as direct marginalisation?

A

NP-hard (some smarter and more efficient algorithms exist)