Week 6 Flashcards

1
Q

Overfitting

A

When fitting the observed facts, the data seen so far, well, does no longer indicate a small out-of-sample error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Deterministic noise

A

The part of the target function that is outside of the best approximation to the target function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Stochastic noise

A

Random noise that cannot be modeled.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

State two differences between deterministic and stochastic noise

A

1) If we generate the same data again, the deterministic noise would be the same but the stochastic noise would be different.
2) Different models capture different parts of the target function -> deterministic noise depends on the learning model you use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

The variance of the stochastic noise is captured by the variable…

A

sigma_squared

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the cause of overfitting?

A

Noise

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name two cures for overfitting:

A

1) Regularization
2) Validation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Regularization

A

Attempts to minimize Eout by working through the equation
Eout(h) = Ein(h) + overfit penalty

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Validation

A

Estimates the out-of-sample error directly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

validation set

A

A subset from the data that is not used in training.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

When is a set no longer a test set?

A

When it affects the learning process in any way.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How is the validation set created?

A

The data set D is divided in a training set of size (N-K) and a validation set of size K. A final hypothesis is learned by the algorithm using the training set. Then the validation error is calculated with the validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is the rule of thumb for determining K in validation?

A

K = N/5
Use 80% for training and 20% for validation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Cross validation estimate

A

The average value of the error made by gn on its validation set.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Wat denoteert H.theta?

A

De polynomen van graad d (~erboven)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Wat is theta(x)? (z)

A

de x-vector (bijv (1,x).T)p met daarbij nog x^2 … x^d(~erboven)

17
Q

Wanneer zijn we aan het overfitten?

A

Als het algorithme probeert te leren van de ruis in plaats van van het patroon.

18
Q

Wat is de oorzaak van overfitting in een voorbeeld zonder ruis?

A

Deterministische ruis: we kunnen met de beste hypothesefunctie niet perfect de target functie benaderen

19
Q

Hoe kun je E.out(g.D) uitdrukken in 3 dingen?

A

var+bias+sigma_squared

20
Q

Wat is de var bij het berekenen van E.out(g.D) in 3 delen?

A

E.D,x [ ( g.D(x) - gemiddelde g(x) ) ^2 ]

21
Q

Wat is de bias bij het berekenen van E.out(g.D) in 3 delen?

A

bias = E.x [ (gemiddelde g(x) - f(x) ) ^2 ]

22
Q

Wat is de sigma_squared bij het berekenen van E.out(g.D) in 3 delen?

A

sigma_squared = E.epsilon,x [ (epsilon(x))^2]

23
Q

Hoe bereken je in lineare regressie met ruis de verwachte in-sample error?

A

sigma_squared * (1 - (d+1)/N )

24
Q

Hoe bereken je de verwachte out-of-sample error in lineare regressi met ruis?

A

sigma_squared * (1 + (d+1)/N)

25
Q

Hoe bereken je de verwachte generalization error in lineare regressie met ruis?

A

2*sigma_squared * ((d+1)/N)

26
Q

Wat is H0 in principe?

A

De verzameling van alle hypotheses van de vorm f(x) = b

27
Q

Wat is H1 in principe?

A

De verzameling van alle hypotheses van de vorm f(x) = ax + b

28
Q

Diff. linear classification and linear regression:

A

Classification = binary (or trinary…)
Regression = real numbers

29
Q

Diff. Logistic regression and linear regression:

A

Logistic regression = real between 0 and 1
Linear regression = just real.

30
Q

What does linear regression use to measure the distance between h(x) and f(x)

A

Mean Square Error (MSE)

31
Q

What is the formula for the mean squared error?

A

E.in = 1/N * (h(x.i) - y.i) ^2
for all i in N.

32
Q

What is h(x) in 1) linear regression, 2) perceptron and 3)logistic regression?

A

1) h(x) = s
2) h(x) = sign(s)
3) h(x) = theta(s)
for s = w.T * x

33
Q

Give the logistic regression algorithm (2 steps):

A

For every time step, do
1) compute the gradient
2) Update the weights with fixed learning rate eta:
w(t+1) = w(t) - eta * E.in gradient

34
Q

Stochastic gradient descent

A

Does not use all examples for E.in, it uses a single data sample error or several.

35
Q

How do represent the XOR function in Ands and Ors?

A

f(X) = (not h1 AND h2) OR (h1 AND not h2)

+1 if exactly one of h1, h2 equals +1

36
Q

What does more nodes per hidden layer do with the approximation and generalization of the MLP?

A

approximation goes up,
generalization goes down.