Lecture #1 - Mathematical Preliminaries Flashcards

1
Q

What is optimisation?

A

Optimisation is the task of minimising or maximising some function f(x) by altering the input vector x

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is gradient descent?

A

A first-order iterative optimisation algorithm for finding a local minimum of a differential function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What we usually only consider?

A

The minimum of a function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you get the max of a function, f(x)

A

Flip the function and find the minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does gradient descent allow?

A

Allows to find a critical point of a function

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does it mean when the gradient descent will oscillate and when does it?

A

When gradient descent oscillates, it means that instead of converging smoothly to the minimum of the loss function, the updates to the model parameters keep fluctuating back and forth.

This oscillatory behavior can lead to slow convergence or prevent the algorithm from converging altogether.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Name the three critical points of minimisation.

A
  1. Minimum
  2. No convergence
  3. Saddle point; 1st and 2nd derivative = 0, will not converge to min.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain briefly the concept of the local and global minimum

A
  1. The point/range within a function, where it has the lowest value; Many local minima’s
  2. The actual minimum within the entire function.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the gradient of a function?

A

The gradient of a function is a vector that contains the partial derivatives of the function with respect to each of its input variables. It represents the direction and magnitude of the steepest ascent or descent of the function at a specific point.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What’s the directional derivative.

A

A directional derivative is a measure of how a function changes along a particular direction in its input space. It quantifies the rate of change of the function with respect to a specified direction vector.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does the derivative tell us?

A

The derivative tells us how to change x in order to make a small improvement in y

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What’s the difference between a Jacobian and a Hessian Matrix

A

The Jacobian matrix represents the first-order partial derivatives of a vector-valued function, while the Hessian matrix represents the second-order partial derivatives of a scalar-valued function.

The Jacobian provides information about the local linear behavior and sensitivity of a function, while the Hessian provides information about the local curvature and optimization properties of a function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Discuss the critical points and the Hessian.

A

The function f(x) , at a point where ∇f(x)=0 has

  1. Local minimum if the Hessian is positive definitive (all eigenvalues positive)
  2. A local maximum if the Hessian is negative definite (all eigenvalues negative)
  3. A saddle point if the Hessian is indefinite (there exist positive and negative eigenvalues)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Define Newton’s method and describe its pros and cons.

A

Newton’s method, also known as Newton-Raphson method, is an iterative optimization algorithm used to find the roots of a differentiable function or to find the minimum or maximum of a function. It is particularly useful when the function is non-linear and has complex behavior.

Newton’s method is powerful because it can converge quickly and achieve quadratic convergence near the solution when the initial guess is sufficiently close to the actual root or optimum. However, it has some limitations. The method may fail to converge if the initial guess is far from the solution or if the function has complex behavior, such as multiple roots or oscillations. Additionally, Newton’s method requires the computation of the derivative of the function, which can be computationally expensive for complex functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly