Week 6 Flashcards by Jedidja Marsman

Overfitting

When fitting the observed facts, the data seen so far, well, does no longer indicate a small out-of-sample error.

How well did you know this?

Not at all

Perfectly

Deterministic noise

The part of the target function that is outside of the best approximation to the target function.

How well did you know this?

Not at all

Perfectly

Stochastic noise

Random noise that cannot be modeled.

How well did you know this?

Not at all

Perfectly

State two differences between deterministic and stochastic noise

1) If we generate the same data again, the deterministic noise would be the same but the stochastic noise would be different.
2) Different models capture different parts of the target function -> deterministic noise depends on the learning model you use.

How well did you know this?

Not at all

Perfectly

The variance of the stochastic noise is captured by the variable…

sigma_squared

How well did you know this?

Not at all

Perfectly

What is the cause of overfitting?

Noise

How well did you know this?

Not at all

Perfectly

Name two cures for overfitting:

1) Regularization
2) Validation

How well did you know this?

Not at all

Perfectly

Regularization

Attempts to minimize Eout by working through the equation
Eout(h) = Ein(h) + overfit penalty

How well did you know this?

Not at all

Perfectly

Validation

Estimates the out-of-sample error directly

How well did you know this?

Not at all

Perfectly

validation set

A subset from the data that is not used in training.

How well did you know this?

Not at all

Perfectly

When is a set no longer a test set?

When it affects the learning process in any way.

How well did you know this?

Not at all

Perfectly

How is the validation set created?

The data set D is divided in a training set of size (N-K) and a validation set of size K. A final hypothesis is learned by the algorithm using the training set. Then the validation error is calculated with the validation set.

How well did you know this?

Not at all

Perfectly

What is the rule of thumb for determining K in validation?

K = N/5
Use 80% for training and 20% for validation.

How well did you know this?

Not at all

Perfectly

Cross validation estimate

The average value of the error made by gn on its validation set.

How well did you know this?

Not at all

Perfectly

Wat denoteert H.theta?

De polynomen van graad d (~erboven)

How well did you know this?

Not at all

Perfectly

Wat is theta(x)? (z)

Study These Flashcards

de x-vector (bijv (1,x).T)p met daarbij nog x^2 … x^d(~erboven)

Wanneer zijn we aan het overfitten?

Study These Flashcards

Als het algorithme probeert te leren van de ruis in plaats van van het patroon.

Wat is de oorzaak van overfitting in een voorbeeld zonder ruis?

Study These Flashcards

Deterministische ruis: we kunnen met de beste hypothesefunctie niet perfect de target functie benaderen

Hoe kun je E.out(g.D) uitdrukken in 3 dingen?

Study These Flashcards

var+bias+sigma_squared

Wat is de var bij het berekenen van E.out(g.D) in 3 delen?

Study These Flashcards

E.D,x [ ( g.D(x) - gemiddelde g(x) ) ^2 ]

Wat is de bias bij het berekenen van E.out(g.D) in 3 delen?

Study These Flashcards

bias = E.x [ (gemiddelde g(x) - f(x) ) ^2 ]

Wat is de sigma_squared bij het berekenen van E.out(g.D) in 3 delen?

Study These Flashcards

sigma_squared = E.epsilon,x [ (epsilon(x))^2]

Hoe bereken je in lineare regressie met ruis de verwachte in-sample error?

Study These Flashcards

sigma_squared * (1 - (d+1)/N )

Hoe bereken je de verwachte out-of-sample error in lineare regressi met ruis?

Study These Flashcards

sigma_squared * (1 + (d+1)/N)

Hoe bereken je de verwachte generalization error in lineare regressie met ruis?

2*sigma_squared * ((d+1)/N)

Wat is H0 in principe?

De verzameling van alle hypotheses van de vorm f(x) = b

Wat is H1 in principe?

De verzameling van alle hypotheses van de vorm f(x) = ax + b

Diff. linear classification and linear regression:

Classification = binary (or trinary...) Regression = real numbers

Diff. Logistic regression and linear regression:

Logistic regression = real between 0 and 1 Linear regression = just real.

What does linear regression use to measure the distance between h(x) and f(x)

Mean Square Error (MSE)

What is the formula for the mean squared error?

E.in = 1/N * (h(x.i) - y.i) ^2 for all i in N.

What is h(x) in 1) linear regression, 2) perceptron and 3)logistic regression?

1) h(x) = s 2) h(x) = sign(s) 3) h(x) = theta(s) for s = w.T * x

Give the logistic regression algorithm (2 steps):

For every time step, do 1) compute the gradient 2) Update the weights with fixed learning rate eta: w(t+1) = w(t) - eta * E.in gradient

Stochastic gradient descent

Does not use all examples for E.in, it uses a single data sample error or several.

How do represent the XOR function in Ands and Ors?

f(X) = (not h1 AND h2) OR (h1 AND not h2) +1 if exactly one of h1, h2 equals +1

What does more nodes per hidden layer do with the approximation and generalization of the MLP?

approximation goes up, generalization goes down.

Week 6 Flashcards

(36 cards)