PAC Learning Flashcards

1
Q

What is proof by contraposition?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the intuition of the VC Dimension?

A

Even if you have infinitely many hypotheses in your hypothesis class, given some training sample, many of those hypotheses will look functionally the same wrt that specified sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is h? What does it do?

A
  • A hypothesis.
  • Applied to some dataset S, generates a labeling of S
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is S?

A

the dataset

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For the realizable setting, what are the relations between the bound and 1) epsilon, 2) abs(H)?

A
  • Bound is inversely linear in epsilon (e.g. halving the error requires double the examples)
  • Bound is only logarithmic in |H| (e.g. quadrupling the hypothesis space only requires double the examples)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

For the agnostic setting, what are the relations between the bound and

1) epsilon,
2) abs(H)

A
  • Bound is inversely quadratic in epsilon (e.g. halving the error requires 4x the examples)
  • Bound is only logarithmic in |H| (i.e. same as Realizable case)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is shattering?

A

The hypotheses in H can perfectly classify every possible labeling of S

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the VC-dimension?

A

Def: The VC-dimension (or VaporikChervonenkis dimension) of ℋ is the cardinality of the largest set “ such that ℋ can shatter “.

Or the definition from the recitation. Get rid of one of these

VC dimension of a hypothesis space H is the maximum number of points such that there exists at least one arrangement of these points and a hypothesis h ∈ H that is consistent with any labeling of this arrangement of points

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

When is the VC dimension infinity?

A

If ℋ can shatter arbitrarily large finite sets, then the VC-dimension of ℋ is infinity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To prove that that VC(H) = some value M, what do you have to do?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What’s the vc dimension for separators in n dimensions?

A

n+1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does ∃ mean?

A

There exists

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

(high level) What’s the corollary to Theorem 1 of the PAC theorem?

A

Give a numerical bound of the true error

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What’s the key idea that makes Correlary 4 of theorem 1 of pac learning useful?

A
  • We want to tradeoff between low training error and keeping H simple (aka low VC(H))
  • We can tune the lambda parameter in the regularize to hopefully get us to land at the sweet spot in the graph
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the practical ways we can tradeoff between low training error and keeping H simple? (1)

A

Use a regularizer

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are discriminative models?

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What’s a pmf?

A

p(x) : Function giving the probability that discrete r.v. X takes value x.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What’s a pdf?

A

f(x) : Function the returns a nonnegative real indicating the relative likelihood that a continuous r.v. X falls in a certain interval

19
Q

What’s a cdf? What’s the symbol?

A

F(x) : Function that returns the probability that a random variable X is less than or equal to x:

20
Q

What does a beta distribution look like?

A
21
Q

What does a dirichlet distribution look like?

A
22
Q

What’s the symbol for expected value of x?

A

E[X]

23
Q

What are the equations for expected value?

A
24
Q

What’s the equation for variance (one that applies to both discrete and continuous)

A
25
Q

What’s the key concept of joint probability? What’s its symbol?

A
  • p(x, y)
  • Key concept: two or more random variables may interact. Thus, the probability of one taking on a certain value depends on which value(s) the others are taking.
26
Q

What’s a marginal distribution? How is it written?

A

For x, it’s written: p(x)

It gives the probabilities of various values of the variables in a subset without reference to the values of the other variables.

E.g. you might have probabilities for two variables x and y each taking some values. Here, the marginal distribution of x wouldn’t include any reference to y

27
Q

What’s the conditional probability?

A

p(x|y)

28
Q

What’s the equation for conditional probability?

A

p(x, y) = p(x|y)*p(y)

29
Q

In mathematical terms, how do we know if two variables are independent?

A

p(x, y) = p(x)p(y)

30
Q

What does it mean to be conditionally independent? Write it in mathematical terms

A

p(x, y|z) = p(x|z)p(y|z)

31
Q

What is p*?

A

probability distribution (unknown)

32
Q

What is S?

A

The training dataset {x^(1),x^(2),…x^(N)}

33
Q

What is H (callographic)?

A

Our hypothesis space

34
Q

What is h?

A

maps from inputs (xs) to outputs (ys)

35
Q

What is epsilon?

A

The amount of error

36
Q

What is delta?

A

(1-delta) = the probability of PAC?? (Not sure about this)

37
Q

Define consistent in mathematical terms

A
38
Q

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?

A

d + 1

39
Q

What’s an important thing to remember about shattering?

A

To say we shatter n points in d dimensions, we just need to find one arrangement where all possible labelings of that arrangement can be correctly classified. I.e. shattering does not mean it can classify every possible arrangement

40
Q

When given a certain shape classifier and asked if it can shatter a configuration, what’s a good progression to run through to check?

A

Sequentially check if the classifier can segregate any combination of n points from n=1 to N, where N is the total number of points. If so, then it does shatter.

E.g. Can this classifier select:

  1. Any 1 point
  2. Any combination of two points
  3. Any combination of 3 points
  4. ….
  5. All N points
41
Q

What’s a concept class?

A

Synonymous with hypothesis class

42
Q

What is a consistent learner?

A

A learner that achieves 0 training error in a realizable setting

43
Q

What gives a numerical bound of the true error?

A

The corollary to Theorem 1 of the PAC theorem?

44
Q

d + 1

A

How many points can a linear boundary (with bias) classify exactly for d-Dimensions?