Week 4 Flashcards

1
Q

What does the out-of-sample error measure?

A

How well our training on the data set has generalized to data we have not seen before.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What does the in-sample error measure?

A

The training performance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Generalization error

A

The discrepancy between Ein and Eout.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the concept of the growth function?

A

A number that captures how different the hypotheses in H are, and how much overlap the different events have.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

dichotomy

A

The N-tuple that you get when you apply a hypothesis h (in H) to a finite sample. It splits the sample in two groups: h=+1 and h=-1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Wat representeert het aantal dichotomiën eigenlijk?

A

De verscheidenheid van de hypotheseklasse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the growth function in the case of positive rays?

A

mH(N) = N + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the condition with positive rays?

A

H consists of all hypothesis h: R -> {-1, +1} of the form h(x) = sign(x-a).
-1 is returned left from a and +1 is returned right from a.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the condition with positive intervals:

A

H consists of all hypotheses in one dimension that return +1 if in some interval and -1 otherwise.
Each hypothesis is specified by two end points of that interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How many different dichotomies are there in the positive intervals condition?

A

N+1 / 2 different dichotomies.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the growth function in the case of positive intervals?

A

mH(N) = 0.5N^2 + 0.5N + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the hypothesis space in the case of convex sets?

A

H consists of all hypotheses in two dimensions h: R^2 -> {-1,+1} that are positive inside some convex set and negative elsewhere.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When is a set convex?

A

If the line segment connecting any two points within the set lies entirely within the set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the growth function in the case of convex sets?

A

mH(N) = 2^N

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is…

A

het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wat is het kleinste breekpunt, uitgedrukt in dvc?

A

k= dvc + 1

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

break point

A

When no data set of size k can be shattered by H, then k is the break point for H.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What kind of dvc do good models have?

A

A finite dvc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What kind of dvc do bad models have?

A

A infinite dvc.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

How do you compute a dvc for the perceptron?

A

First show that the dvc is at least a certain value, then show it is at most a certain value.

21
Q

What can you conclude about dvc if there is a set of N points that can be shattered by h?

A

dvc >_ N

22
Q

What can you conclude about dvc if any set of N points can be shattered by H?

A

dvc>_N

23
Q

What can you conclude about dvc if there is a set of N datapoints that cannot be shattered by H?

A

Nothing

24
Q

What can you conclude about dvc if there is no dataset of N points that can be shattered by H?

A

dvc<N

25
Q

What is the VC dimension of a d-dimensional perceptron^3?

A

d+1

26
Q

Pocket algorithm

A

Adjustment on PLA: Save the hypothesis with the smallest Ein

27
Q

What is a plus of the pocket algorithm vs the PLA?

A

It works even if the data is not linearly separable

28
Q

What is a disadvantage of the pocket algorithm>

A

It takes a long time to calculate Ein.

29
Q

Wat is het plan om een kleine Eout te bereiken?

A

1) Zorg dat je de hypothese kiest met de kleinste Ein.
2) Zorg dat Eout ~=~ Ein

30
Q

Als twee hypothesen op elkaar lijken…

A

dan zullen zowel de in-sample als de out-of sample errors op elkaar lijken de gebeurtenissen Bi veel overlappen.

31
Q

Wat telt het aantal dichotomieën?

A

Het effectieve aantal hypothesen.

32
Q

Wat beschrijft de groeifunctie voor concept?

A

Het is het grootste aantal dichotomieën dat je op N punten kunt krijgen, als je de punten zelf mag kiezen.

33
Q

Wat is M?

A

Het aantal hypothesen in de hypotheseklasse

34
Q

Hoe ga je om met dat M pessimistisch is?

A

Maak gebruik van het begrip ‘aantal effectieve hypothesen’ als vervanging voor M, zo ontwijk je de kans op grote afwijkingen van Eout ten opzichte van Ein.

35
Q

Effectieve aantal hypothesen

A

Kies N datapunten en kijk alleen naar wat de hypothesen doen op die datapunten.

36
Q

Wat is het verschil tussen een hypothese en een dichotomie?

A

Hypothese h: X -> {-1, +1}

Dichotomie h: {x1, x2, x…, xN} -> {-1, +1}

37
Q

Wat is het maximale aantal dichotomieën voor een set van N punten?

A

2^N

38
Q

Definieer de groeifunctie:

A

mH(N) = max voor x1, x…, xN in X |H(x1, x…, xN)|

39
Q

Union bound

A

For any finite set of events, the probability that at least one of them happens is not greater than the sum of the probabilities of the individual events.

40
Q

In what situation do we say that H shatters the data set of N points?

A

If H is capable of generating all possible dichotomies on those N points. (H is as diverse as it can be on this set, the max of 2^N dichotomies are realised).

41
Q

If the condition mH(N) = 2^N breaks at any point, …

A

we can bound mH(N) for all values of N by a simple polynomial based on this break point.

42
Q

What is the upper bound for any mH(N) that has a break point k?

A

mH(N) <_ B(N,k) met B(N,k) as the maximum amount of dichotomies on N datapoints such that no subset of size k of the N points can be shattered by these dichotomies.

43
Q

Gegeven een hypotheseklasse, de VC-dimensie dvc van die hypotheseklasse is…

A

het grootste aantal punten N zodat mH(N) = 2^N. Oneindig als voor alle N geldt dat mH(N) = 2^N.

44
Q

Geef weer hoe mH(N) gebonden wordt door de VC-dimensie:

A

mH(N) <_ N^dvc + 1

45
Q

Kleinste kwadraten schatter

A

Wanneer je de gradiënt van de in-sample error = 0 oplost en dan wlin krijgt. Je kunt y voorspellen voor een willekeurige x.

46
Q

Ruisterm

A

Een toevalsvariabele in lineare regressie, y = f(x) + epsilon. Epsilon is dan de ruisterm.

46
Q

Wat representeert het aantal dichotomiën eigenlijk?

A

De verscheidenheid van de hypotheseklasse.

47
Q

Hoe bepaal je de groeifunctie van een hypotheseklasse H?

A

1) Kies N=1, en 1 datapunt x->. Kun je beide dichotomiën realiseren met hypothesen uit H? Dan m(H) = 2^N = 2. Anders is N=2 een breekpunt.
2) Ga door voor N=2, N=3 etc. Als je nooit een breekpunt tegenkomt, dan geldt m(H) = 2^N.

48
Q

Give the generalization bound:

A

E.out(g) <_ E.in(g) + the root of ( (1/2N) * ln(2M/delta) )
with probability of at least 1 - delta.