Week 3 Flashcards
Wat is X in het knikker-vazen model?
Het aantal rode knikkers in de steekproef (het sample).
Wat is nu, uitgedrukt in X en N?
X/N
Het aantal rode knikkers in de steekproef / de grootte van de steekproef.
Geef de Hoeffding Inequality:
P[|E.in-E.out| > epsilon] = 2e ^(-2(epsilon^2)N)
voor alle epsilon > 0
Noem de twee soorten supervised learning-problemen:
Classificatie: Y bestaat uit een klein aantal elementen (Bij binair: 2 elementen)
Regressie: Y = R
Linear regression
A linear model based on the signal function. The output is the signal.
Logistic regression
A linear model that outputs a probability between 0 and 1. Holds no threshold at all:
h(x) = theta * ( w^T * x )
Give the logistic function theta(s):
theta(s) = (e^s) / (1+e^s)
output between 0 and 1
Linear classification
Uses a hard threshold on the signal.
h(x) = sign(w^T *x)
classification output…
is bounded
regression output…
is real
What is meant with the ‘soft threshold’?
the logistic function theta(s)
Why is the logistic function also called a sigmoid?
because its shape looks like a flattened out ‘s’
What is the target that a logistic function is trying to learn?
A probability that depends on the input x.
What is the target function in logistic regression?
f(x) = P[y=+1 | x]
Error measure
How likely it is that we would get this output y from the input x if the target distribution P(y|x) was indeed captured by our hypothesis h(x).
Give the formula for the in-sample error in linear regression:
E.in(h) = 1/N * (w.T*x.i - y.i)^2
voor alle n in N
method of maximum likelihood
Selects the hypothesis h(x) which maximises the probability to get all yn’s in the dataset from the corresponding xn’s
Give the formula for the in-sample error measure for logistic regression:
Ein(w) = 1/N * ln(1+ e^(-yn* w.T *xn))
for all n in N.
What is the target in linear regression?
A noisy target function formalized as a distribution of the random variable y.
Linear regression: method and goal
We have an unknown distribution P(x,y) that generates each (xn,yn), and we want to find a hypothesis g that minimizes the error between g(x) and y with respect to that distribution.
Matrix representation of Ein(h) in linear regression:
The N x (d+1) matrix with input vectors xn as rows and y the target vector as columns with yn as components/target values.
How do you get the gradient of Ein(w) to be 0?
Solve the following for a w:
X^T * X * w = X^T * y
Wat is ^y in de kwadratische fout?
De schatting volgens de hypothese.
Wat is y in de kwadratische fout?
De correcte waarde, waar we op mikken (target waarde)
Geef de formule voor de kwadratische fout (squared error):
e(^y, y) = (^y - y)^2
Linear regression algorithm in 3 steps
1) Construct matrix X and vector y from the data set, with each x0=1.
2) Compute the pseudo-inverse of the matrix X.
3) Return wlin = pseudo-inverse of X * y
OLS
Ordinary least squares
hoe wordt de afgeleide van functie f in richting w.i geschreven?
omgekeerde a/ omgekeerde a w.i f(x)
Wat is (omgekeerd driehoekje v) f(x)?
De vector van de afgeleides van f(x) in richting w0, … wn.
Wat is de kleinste-kwadraten schatter (least-squares estimator)?
De oplossing wlin die je krijgt als je de gradient van E.in oplost voor gradient = 0.
Wat is wlin?
w = ((X.T * X) ^-1) * X.T *y
Wat kun je doen met de kleinste-kwadraten schatter?
y voorspellen voor een willekeurige x.
Hoe voorspel je y voor een willekeurige x met de kleinste-kwadraten schatter?
^y = w.lin.T * x
Wat doet de hat matrix?
Vertaalt de daadwerkelijke outputdata y naar outputdata die met de hypothese kloppen.
What is the main difference between the learning approach and the design approach?
The role that data plays: in the design approach, the problem is well-defined and f can be analytically derived without seeing data. In the learning approach, data is needed to pin down f.