Week 3 - Supervised learning: regression Flashcards
Regression
mapping inputs to outputs. Now, given a new input, predict
the output
Regression deals with continuous (real-valued) outputs.
Linear regression
- standard formula
y = mx + b
* We need a cost function! (e.g. SSE)
But how do we actually find the parameters m and b in the function y=mx+b?
You want to find the right parameters for the function f(x) = c0 + c1x + c2x2 + c3x3 + … + ckxk, so that f(x) is as small as possible
Overparameterizing a model leads
to …
Overparameterizing a model leads
to overfitting
optimization (or minimization) problem
Let f(p1, p2, …, pk) be a function that maps our parameter values to the sum of
squared errors.
Find the values p1, p2, …, pk that minimize f(p1, p2, …, pk).
Function f returns one value (sum of squared errors).
In other words, we need to minimize the difference between human data and the
data generated by our computational model
How would you go about finding p1, p2, …, pk?
Grid search (aka parameter sweep)
Grid search aka parameter sweep
Why is this an exhaustive search?
Grid Search is an exhaustive search, meaning it looks at all possible combinations of parameter values within a specified range.
Grid search aka parameter sweep
With precision j, we evaluate the function in increments of 1/j of the parameter range.
Example:
Let’s say you have a parameter ‘p’ with a range [200, 400].
Choose precision ‘j’ as 101.
Evaluate the objective function at 101 points within the range, with ‘p’ values like [200, 202, …, 400].
For each combination, you calculate how well your model performs.
Grid search aka parameter sweep
- Con (1)
Con of grid search:
Curse of dimensionality*
To do an exhaustive search over k parameters with precision j, we need to evaluate
the function … times.
To do an exhaustive search over k parameters with precision j, we need to evaluate
the function jk times.
- This is considered computationally inefficient.
simplex
A simplex is a shape defined by connecting
k + 1 vertices in k-dimensional space.
Nelder–Mead (aka simplex) algorithm uses (4 steps) to find a
minimum.
uses reflection, expansion,
contraction and shrinking to find a
minimum.
Nelder-Mead algorithm:
Simplex
What kind of simplex?
- zie sheet
In the context of Nelder-Mead, we’re dealing with a 2D simplex, which means a triangle with three vertices.
Example: In 2D space with parameters m and b, our simplex is a triangle with vertices B, G, and W.
B = Best
G = Good
W = Worst
Nelder-Mead algorithm:
Reflection
- zie sheet
Reflect the worst vertex (W) through the midpoint of the line formed by the best and good vertices (B and G) to get a new vertex (R).
Nelder-Mead algorithm:
Expansion
(If f(B) < f(R) < f(G))
- zie sheet
Groter dan = beter!
Replace vertex W with vertex R