L6: Second Winter, Reasoning Under Uncertainty Flashcards

Question 1

Q

What were the issues with expert systems?

Answer

A

Classical formal logic is not suited for all problems.
Performance issues on large knowledge bases.
Expert elicitation—translating expert’s knowledge to a computer—is very hard.
Many domain experts don’t actually reason using “if – then –” rules.
Despite good PR, some academic “milestones” were not commercially viable.
Dendral is one such example; widely lauded, but never made it out of the lab.

Question 2

Q

Why is there a need for uncertainty representation?

Answer

A

Sometimes problems are difficult to describe with full certainty.
May be due to, e.g.,
* Inherent randomness in outcomes;
* Unknowns/uncertainties in domain description;
* Existence of many special cases.

Question 3

Q

What are different methods for representing uncertainty?

Answer

A

(Classical) probability theory
Various extensions and generalisations of probability theory
* Belief functions (Dempster-Shafer theory);
* Interval probabilities;
* Credal sets.
Fuzzy sets and logics
Non-monotonic logics

Question 4

Q

What is a probability? What is the Bayesian and frequentist approach

Answer

A

Quantifies uncertainty about occurrence of some event of interest

Bayesian probability: use subjective degree of belief P(A).
Frequentist probability: use the limiting frequency of observations ω1, ω2, . . .
P(A) = lim n→+∞ 1/n sum(indicator function on w)
when indicator function = 1 if w in A otherwise 0.

Question 5

Q

What is a sample space?
What is an event?
What do you call a singleton event?
What is an outcome?

Answer

A

A sample set is a non-empty set Ω (represents atomic things that can happen- outcomes)

Outcome w in Ω is a specific possible outcome (Singleton event = atomic event)

An event is a subset A of Ω (describes something that can happen)

Question 6

Q

What is a sigma algebra over Ω?

Answer

A

A sigma-algebra over Ω is a set Σ of events such that
1. Contains sample space: Ω ∈ Σ
2. Closed under complements: If A ∈ Σ then also (Ω \ A) ∈ Σ
3. Closed under countable unions: If A1, A2, . . . ∈ Σ then also S
i Ai ∈ Σ

THIS IS THE EVENT SPACE

A sigma-algebra is a collection of subsets of X including X and the empty set.

Σ captures the events about which a probability measure P can say something.

Question 7

Q

What is a measurable space?

Answer

A

The tuple (Ω, Σ) is called a measurable space. The sigma algebra define the subsets that will be measured.

Question 8

Q

What is a probability measure?

Answer

A

A probability measure P : Σ → R on this space satisfies
* P(A) ≥ 0 for all A ∈ Σ
* P(Ω) = 1
* P(∪iAi) = sum P(Ai) for all disjoint A1, A2, . . . ∈ Σ.

Question 9

Q

What is a probability space?

Answer

A

The triple (Ω, Σ, P) is called a probability space

A probability space consists of three elements:

A sample space, Ω, which is the set of all possible outcomes.
An event space Σ, which is a set of events, an event being a set of outcomes in the sample space.
A probability function P, which assigns each event in the event space a probability, which is a number between 0 and 1.

Question 10

Q

What is a random variable?

Answer

A

Random variables are functions that map elements of a probability outcome space Ω into another
outcome space (for us often real valued)

Let (Ω, Σ, P) be a probability space and let (W , F) be a measurable space.
A random variable (RV) X is a map X : Ω → W

Question 11

Q

What does it mean for an RV to be measurable?

Answer

A

This means that for all A ∈ F that
X^−1(A) = {ω ∈ Ω : X(ω) ∈ A} ∈ Σ

That for all A we can find values in the original sample space that give our value when mapped, and this set corresponds to an event in our event space (so we can assign a probability!)

we need to be able to compute probabilities with random variables, and the only way to do this is to find the pre-image of sets and compute probabilities with them. This requires that the pre-image of (measurable) sets be measurable, which is the definition of a measurable function.

To be more precise, suppose (Ω1,F1,P) is your probability space, (Ω2,F2) is your “target” space (this is usually taken to be R with its Borel sets). Then, suppose we want X:Ω1→Ω2 to be a random variable. This means that we want to be able to measure the probability of events like {X∈A} where A∈F2. Well, the way we do this is to find X^-1(A) , and try to compute P(X-1(A)). We can only do this if X^-1(A)∈F1, in other words, if X
is a measurable function..

Basically that X-1 on A is in our original sigma algebra. So for example if curly F has the event ‘two heads’ in it, then the inverse of this event also needs to be in sigma so we can assign a probability to it

Question 12

Q

How does an RV induce a probability measure P_x?

Answer

A

P_x(A) = P(X^-1(A))

The probability of event A in the the new measurable space is the same as the probability of the ‘parent’ event in our original probability space

Question 13

Q

What are multivariate probabilities?

Answer

A

Suppose we want to talk about multiple variables at once.
Simply take a sample space X = X1 × · · · × Xn
and some sufficiently expressive sigma-algebra F over X.
We take the RV X~ : Ω → X and build the probability space (X, F, P_X ).

It will be useful to denote the corresponding PMF as p(X1, . . . , Xn) = P_X({x1,…,xn}) for all (x1,…,xn) subset of X

Whenever a collection of random variables are mentioned, they are ALWAYS assumed to be defined on the same sample space.

Question 14

Q

What is the equation for the conditional PMF of X1 and X2?

Answer

A

p(X1|X2) = p(X1,X2)/p(X2) if p(X2) > 0

In general to condition an and statement, just need to divide by prob without the left hand prob in condition

Question 15

Q

What is the chain rule for probabilities?

Answer

A

The chain rule for probabilities states that
p(X1, . . . , Xn) = Product of p(Xik| Xi1, . . . , Xik−1) for all n

using the convention that for k = 1, we let p(Xik| Xi1, . . . , Xik−1) = p(Xi1)

Question 16

Q

What is probabilistic independence?

Answer

A

Probabilistic independence between variables X1, X2 iff their joint PMF factorises:
p(X1, X2) = p(X1)p(X2)

Question 17

Q

What is conditional independence?

Answer

A

Conditional independence between variables X1, X2 given X3 iff
p(X1, X2 | X3) = p(X1 | X3)p(X2 | X3)
We then write X1 ⊥ X2 | X3. (note if independent from X4 too can write X1 ⊥ X2, X4 | X3)

It follows that
X1 ⊥ X2 | X3 ⇒ p(X1 | X2, X3) = p(X1 | X3).

Question 18

Q

If we are specifying a model over multiple RVs X1,..X3 how many numerical parameters do we need? Assume binary

Answer

A

Takes 2^n-1 numerical parameters

If we have three X variables then we have 2^3 (i.e. 8) possible combinations of 0s and 1s, need to know prob of 7 of them (last can calculate by taking rest away from 1)

Question 19

Q

How many parameters do we need if we assume all RVs are independent?

Answer

A

n (sum of prob of 0 and 1 of each variable will equal 1)

Question 20

Q

What are Bayesian Networks?

Answer

A

Bayesian networks (BNs) were introduced by Judea Pearl in 1985.
* BNs are probabilistic graphical models.
* Encode conditional independencies between variables

Bayesian network encodes a PMF of variables X1, . . . , Xn.
Core of the representation is an acyclic directed graph (DAG) G = (V, E).
* The vertices are given by V = {X1, . . . , Xn}.
That is, the vertices of the graph correspond to the random variables in the model.
* Edges E indicate, roughly, “influence”/ dependency

Graph can often be specified by expert knowledge because presence of edges only indicates dependence between variables

BNs can be efficiently encoded in computer

Question 21

Q

What are conditional probability tables?

Answer

A

The collection of parents of Xi
in G is denoted pa(Xi)

Definition
A family of conditional probability tables (CPTs) for a given DAG G = (V, E)
is a map P(· | ·) that gives for all Xi ∈ V a conditional PMF P(Xi| pa(Xi)).

See tables on slide 62

CPTs=Either determined from expert knowledge or estimated from data.

Question 22

Q

What is the complexity of specific inferences in BN such as direct marginalisation?

Answer

A

NP-hard (some smarter and more efficient algorithms exist)