Linear Models for Regression Flashcards
What is regression?
The problem in which we have a set of points from a function and we want to aproximate that function without knowing the original function.
What is the relationship between regression and classification?
Any regression problem can be blassed as a classification one. Each point for regression lies on the decision boundary (we just need to string them together)
What are the 3 basis for finding a function?
> Polynomial basis
> “Gaussian” basis
> Sigmoid basis
What is polynomial basis?
ϕi(x) = xi
y(x,w) = w0 + w1x + w2x2 + … + wMxM
Set ϕ0 = 1
y(x,w) = w0 + ∑ (wiϕi(x)) = wTϕ(x)
What is Gaussian basis?
ϕi = e^(-(x - μi )2 / (2s2 ))
What is the sigmoid basis?
ϕi(x) = σ((x - μ) / s) = σ(u,s)
σ(a) = 1 / (1 + e-a)
What is the vector equation that we would use to calculate the exact solution to a regression problem and what is required for this?
w = Φ-1 t
This requires that matrix, Φ, be square so we can only pick a square number of points.
What does overdetermined mean? and what impact does this have?
This is when there are more equations than varaibles and there is no exact solution possible. Instead we can define an error to minise this.
What is the equation for error?
E = 0.5 ∑(ϕiTw - ti )2
Derrive the equations for the least squares solution
E = 0.5 ∑ (ϕiTw - ti )2
∇E = ∑ ϕi(ϕiTw - ti)
This will be minimum when the gradient ∇E = 0
ϕT(ϕw - t) = 0
ϕTϕw = ϕTt
w = (ϕTϕw)-1 ϕTt
ϕp = (ϕTϕw)-1 ϕT
w = ϕpt
What is the data set and equation form and what are the steps for sum of squares solution?
Data set:
- (x, t)
- (x_value, target)
- Equation:
- Example: y = w0 + w1f(x)
Basis:
- Polynomial
- Gaussian
- Sigmoid
Step 1: Calculate ϕ using the basis and bias
ϕ = [bias, basis]
[… , …]
Step 2: Compute ϕTϕ
Step 3: Compute (ϕTϕ)-1
Step 4: Compute ϕp = (ϕTϕ)-1ϕT
Step 5: Compute the result using the targets w = ϕpt
How can sequential learning be applied to the sum of squares solution? Why do we want to do this? What is this process called?
With lots of points, this is a computationally expensive process so we can consider one point at a time and use gradient descent. This is the least-mean-squares algorithm
What is the equation for the least mean squares algorithm?
wt+1 = wt - ηϕnT(ϕnw - tn)
Why is the sum of squares error equation ideal?
Because it is convex and has one global minimum