Lecture #1 - Mathematical Preliminaries Flashcards
What is optimisation?
Optimisation is the task of minimising or maximising some function f(x) by altering the input vector x
What is gradient descent?
A first-order iterative optimisation algorithm for finding a local minimum of a differential function.
What we usually only consider?
The minimum of a function
How do you get the max of a function, f(x)
Flip the function and find the minimum
What does gradient descent allow?
Allows to find a critical point of a function
What does it mean when the gradient descent will oscillate and when does it?
When gradient descent oscillates, it means that instead of converging smoothly to the minimum of the loss function, the updates to the model parameters keep fluctuating back and forth.
This oscillatory behavior can lead to slow convergence or prevent the algorithm from converging altogether.
Name the three critical points of minimisation.
- Minimum
- No convergence
- Saddle point; 1st and 2nd derivative = 0, will not converge to min.
Explain briefly the concept of the local and global minimum
- The point/range within a function, where it has the lowest value; Many local minima’s
- The actual minimum within the entire function.
What is the gradient of a function?
The gradient of a function is a vector that contains the partial derivatives of the function with respect to each of its input variables. It represents the direction and magnitude of the steepest ascent or descent of the function at a specific point.
What’s the directional derivative.
A directional derivative is a measure of how a function changes along a particular direction in its input space. It quantifies the rate of change of the function with respect to a specified direction vector.
What does the derivative tell us?
The derivative tells us how to change x in order to make a small improvement in y
What’s the difference between a Jacobian and a Hessian Matrix
The Jacobian matrix represents the first-order partial derivatives of a vector-valued function, while the Hessian matrix represents the second-order partial derivatives of a scalar-valued function.
The Jacobian provides information about the local linear behavior and sensitivity of a function, while the Hessian provides information about the local curvature and optimization properties of a function.
Discuss the critical points and the Hessian.
The function f(x) , at a point where ∇f(x)=0 has
- Local minimum if the Hessian is positive definitive (all eigenvalues positive)
- A local maximum if the Hessian is negative definite (all eigenvalues negative)
- A saddle point if the Hessian is indefinite (there exist positive and negative eigenvalues)
Define Newton’s method and describe its pros and cons.
Newton’s method, also known as Newton-Raphson method, is an iterative optimization algorithm used to find the roots of a differentiable function or to find the minimum or maximum of a function. It is particularly useful when the function is non-linear and has complex behavior.
Newton’s method is powerful because it can converge quickly and achieve quadratic convergence near the solution when the initial guess is sufficiently close to the actual root or optimum. However, it has some limitations. The method may fail to converge if the initial guess is far from the solution or if the function has complex behavior, such as multiple roots or oscillations. Additionally, Newton’s method requires the computation of the derivative of the function, which can be computationally expensive for complex functions.