PAC Learning Flashcards

Question 1

Q

What is proof by contraposition?

Question 2

Q

What’s the intuition of the VC Dimension?

Answer

A

Even if you have infinitely many hypotheses in your hypothesis class, given some training sample, many of those hypotheses will look functionally the same wrt that specified sample

Question 3

Q

What is h? What does it do?

Answer

A

A hypothesis.
Applied to some dataset S, generates a labeling of S

Question 4

Q

What is S?

Answer

A

the dataset

Question 5

Q

For the realizable setting, what are the relations between the bound and 1) epsilon, 2) abs(H)?

Answer

A

Bound is inversely linear in epsilon (e.g. halving the error requires double the examples)
Bound is only logarithmic in |H| (e.g. quadrupling the hypothesis space only requires double the examples)

Question 6

Q

For the agnostic setting, what are the relations between the bound and

1) epsilon,
2) abs(H)

Answer

A

Bound is inversely quadratic in epsilon (e.g. halving the error requires 4x the examples)
Bound is only logarithmic in |H| (i.e. same as Realizable case)

Question 7

Q

What is shattering?

Answer

A

The hypotheses in H can perfectly classify every possible labeling of S

Question 8

Q

What is the VC-dimension?

Answer

A

Def: The VC-dimension (or VaporikChervonenkis dimension) of ℋ is the cardinality of the largest set “ such that ℋ can shatter “.

Or the definition from the recitation. Get rid of one of these

VC dimension of a hypothesis space H is the maximum number of points such that there exists at least one arrangement of these points and a hypothesis h ∈ H that is consistent with any labeling of this arrangement of points

Question 9

Q

When is the VC dimension infinity?

Answer

A

If ℋ can shatter arbitrarily large finite sets, then the VC-dimension of ℋ is infinity

Question 10

Q

To prove that that VC(H) = some value M, what do you have to do?

Question 11

Q

What’s the vc dimension for separators in n dimensions?

Question 12

Q

What does ∃ mean?

Answer

A

There exists

Question 13

Q

(high level) What’s the corollary to Theorem 1 of the PAC theorem?

Answer

A

Give a numerical bound of the true error

Question 14

Q

What’s the key idea that makes Correlary 4 of theorem 1 of pac learning useful?

Answer

A

We want to tradeoff between low training error and keeping H simple (aka low VC(H))
We can tune the lambda parameter in the regularize to hopefully get us to land at the sweet spot in the graph

Question 15

Q

What are the practical ways we can tradeoff between low training error and keeping H simple? (1)

Answer

A

Use a regularizer

Question 16

Q

What are discriminative models?

Question 17

Q

What’s a pmf?

Answer

A

p(x) : Function giving the probability that discrete r.v. X takes value x.

Question 18

Q

What’s a pdf?

Answer

A

f(x) : Function the returns a nonnegative real indicating the relative likelihood that a continuous r.v. X falls in a certain interval

Question 19

Q

What’s a cdf? What’s the symbol?

Answer

A

F(x) : Function that returns the probability that a random variable X is less than or equal to x:

Question 20

Q

What does a beta distribution look like?

Question 21

Q

What does a dirichlet distribution look like?

Question 22

Q

What’s the symbol for expected value of x?

Question 23

Q

What are the equations for expected value?

Question 24

Q

What’s the equation for variance (one that applies to both discrete and continuous)

Question 25

Q

What’s the key concept of joint probability? What’s its symbol?

Answer

A

p(x, y)
Key concept: two or more random variables may interact. Thus, the probability of one taking on a certain value depends on which value(s) the others are taking.

Question 26

Q

What’s a marginal distribution? How is it written?

Answer

A

For x, it’s written: p(x)

It gives the probabilities of various values of the variables in a subset without reference to the values of the other variables.

E.g. you might have probabilities for two variables x and y each taking some values. Here, the marginal distribution of x wouldn’t include any reference to y

Question 27

Q

What’s the conditional probability?

Question 28

Q

What’s the equation for conditional probability?

Answer

A

p(x, y) = p(x|y)*p(y)

Question 29

Q

In mathematical terms, how do we know if two variables are independent?

Answer

A

p(x, y) = p(x)p(y)

Question 30

Q

What does it mean to be conditionally independent? Write it in mathematical terms

Answer

A

p(x, y|z) = p(x|z)p(y|z)

Question 31

Q

What is p*?

Answer

A

probability distribution (unknown)

Question 32

Q

What is S?

Answer

A

The training dataset {x^(1),x^(2),…x^(N)}

Question 33

Q

What is H (callographic)?

Answer

A

Our hypothesis space

Question 34

Q

What is h?

Answer

A

maps from inputs (xs) to outputs (ys)

Question 35

Q

What is epsilon?

Answer

A

The amount of error

Question 36

Q

What is delta?

Answer

A

(1-delta) = the probability of PAC?? (Not sure about this)

Question 37

Q

Define consistent in mathematical terms

Question 38

Q

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?

Question 39

Q

What’s an important thing to remember about shattering?

Answer

A

To say we shatter n points in d dimensions, we just need to find one arrangement where all possible labelings of that arrangement can be correctly classified. I.e. shattering does not mean it can classify every possible arrangement

Question 40

Q

When given a certain shape classifier and asked if it can shatter a configuration, what’s a good progression to run through to check?

Answer

A

Sequentially check if the classifier can segregate any combination of n points from n=1 to N, where N is the total number of points. If so, then it does shatter.

E.g. Can this classifier select:

Any 1 point
Any combination of two points
Any combination of 3 points
….
All N points

Question 41

Q

What’s a concept class?

Answer

A

Synonymous with hypothesis class

Question 42

Q

What is a consistent learner?

Answer

A

A learner that achieves 0 training error in a realizable setting

Question 43

Q

What gives a numerical bound of the true error?

Answer

A

The corollary to Theorem 1 of the PAC theorem?

Question 44

Q

d + 1

Answer

A

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?