Week 3 - Supervised learning: regression Flashcards

1
Q

Regression

A

mapping inputs to outputs. Now, given a new input, predict
the output

Regression deals with continuous (real-valued) outputs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Linear regression
- standard formula

A

y = mx + b
* We need a cost function! (e.g. SSE)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

But how do we actually find the parameters m and b in the function y=mx+b?

A

You want to find the right parameters for the function f(x) = c0 + c1x + c2x2 + c3x3 + … + ckxk, so that f(x) is as small as possible

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Overparameterizing a model leads
to …

A

Overparameterizing a model leads
to overfitting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

optimization (or minimization) problem

A

Let f(p1, p2, …, pk) be a function that maps our parameter values to the sum of
squared errors.

Find the values p1, p2, …, pk that minimize f(p1, p2, …, pk).

Function f returns one value (sum of squared errors).

In other words, we need to minimize the difference between human data and the
data generated by our computational model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

How would you go about finding p1, p2, …, pk?

A

Grid search (aka parameter sweep)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Grid search aka parameter sweep
Why is this an exhaustive search?

A

Grid Search is an exhaustive search, meaning it looks at all possible combinations of parameter values within a specified range.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Grid search aka parameter sweep

A

With precision j, we evaluate the function in increments of 1/j of the parameter range.

Example:

Let’s say you have a parameter ‘p’ with a range [200, 400].

Choose precision ‘j’ as 101.

Evaluate the objective function at 101 points within the range, with ‘p’ values like [200, 202, …, 400].

For each combination, you calculate how well your model performs.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Grid search aka parameter sweep
- Con (1)

A

Con of grid search:

Curse of dimensionality*

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

To do an exhaustive search over k parameters with precision j, we need to evaluate
the function … times.

A

To do an exhaustive search over k parameters with precision j, we need to evaluate
the function jk times.

  • This is considered computationally inefficient.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

simplex

A

A simplex is a shape defined by connecting
k + 1 vertices in k-dimensional space.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Nelder–Mead (aka simplex) algorithm uses (4 steps) to find a
minimum.

A

uses reflection, expansion,
contraction and shrinking to find a
minimum.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Nelder-Mead algorithm:

Simplex
What kind of simplex?
- zie sheet

A

In the context of Nelder-Mead, we’re dealing with a 2D simplex, which means a triangle with three vertices.

Example: In 2D space with parameters m and b, our simplex is a triangle with vertices B, G, and W.

B = Best

G = Good

W = Worst

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Nelder-Mead algorithm:

Reflection

  • zie sheet
A

Reflect the worst vertex (W) through the midpoint of the line formed by the best and good vertices (B and G) to get a new vertex (R).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Nelder-Mead algorithm:

Expansion

(If f(B) < f(R) < f(G))

  • zie sheet
A

Groter dan = beter!

Replace vertex W with vertex R

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Nelder-Mead algorithm:

Contraction

  • zie sheet
A

If R is worse than the current best point (B), contract the worst point (W) towards R (towards ¼ of the line between W and R), call this point C1.

If the contracted point (C1) is better than the current worst, replace W; otherwise, shrink the simplex toward B.

17
Q

Nelder-Mead algorithm:

Shrinking

  • zie sheet
A

If the contraction doesn’t improve the simplex, shrink it towards the best point (B).

18
Q

Nelder-Mead algorithm:

Rules in comparison of function values

If f(B) < f(R) < f(G)

A

replace W with R.

19
Q

Nelder-Mead algorithm:

Rules in comparison of function values

If f(R) < f(B)

A

expand further to E.

If f(E) > f(R), replace W with E; If f(E) < f(R), replace W with R.

20
Q

Nelder-Mead algorithm:

Rules in comparison of function values

If f(B) < f(G) < f(R)

A

contract W. If the contracted point is better, replace W; otherwise, shrink the simplex toward B.

21
Q

Nelder-Mead algorithm:

Convergence

A

Continue iterating through these steps until the algorithm converges. Convergence is achieved when the shortest edge of the simplex falls below a certain size.

22
Q
A