A set S is convex, if for every pair of points w1, w2 ∈ S, the point λw1 + [1 − λ]w2 must also be in S for all λ ∈ (0, 1)

Basics Of Optimization Flashcards by Pier-Olivier Marquis

Loss function (Objective function)

An optimization problem has an objective function that is defined in terms of a set of variables, referred to as optimization variables. The goal of the optimization problem is to compute the values of the variables at which the objective function is either maximized or minimized.

How well did you know this?

Not at all

Perfectly

Optimality Conditions in Unconstrained Optimization

A univariate function f(x) is a minimum value at x = x₀ with respect to its immediate locality if it
satisfies both f
(x₀)=0 and f(x₀) > 0.

How well did you know this?

Not at all

Perfectly

Taylor expansion for evaluating minima

How well did you know this?

Not at all

Perfectly

Jacobian

How well did you know this?

Not at all

Perfectly

Hessian matrix

How well did you know this?

Not at all

Perfectly

Multivariate Taylor expansion for evaluating minima

How well did you know this?

Not at all

Perfectly

Second-order optimality conditions

How well did you know this?

Not at all

Perfectly

Convex set

A set S is convex, if for every pair of points w₁, w₂ ∈ S, the point λw₁ + [1 − λ]w₂ must also be in S for all λ ∈ (0, 1)

How well did you know this?

Not at all

Perfectly

Convex set - Visual representation

How well did you know this?

Not at all

Perfectly

Convex function

How well did you know this?

Not at all

Perfectly

Linear functions and convexity

A linear function of the vector w is always convex

How well did you know this?

Not at all

Perfectly

Second-Derivative Characterization of Convexity

The twice differentiable function F(w) is convex, if and only if it has a positive semidefinite Hessian at every value of the parameter w in the domain of F(·)

How well did you know this?

Not at all

Perfectly

The following convexity definitions are equivalent for twice differentiable functions defined over R^d

How well did you know this?

Not at all

Perfectly

The sum of a convex function and a strictly convex function

Is strictly convex

How well did you know this?

Not at all

Perfectly

Finite-difference approximation

Is used to verify that the gradient is calculated appropriately

How well did you know this?

Not at all

Perfectly

Decaying learning rate

How well did you know this?

Not at all

Perfectly

The gradient at the optimal point of a line search

Is always orthogonal to the current search direction

How well did you know this?

Not at all

Perfectly

Second-order multivariate Taylor expansion of J(w) in the immediate locality of w₀ along the direction v and small radius ε > 0

How well did you know this?

Not at all

Perfectly

Mini-batch stochastic gradient descent

Study These Flashcards

Min-max normalization

Study These Flashcards

Min-max normalization is useful when the data needs to be scaled in the range (0, 1)

Feature normalization

Study These Flashcards

A common type of normalization is to divide each feature value by its standard deviation. When this type of feature scaling is combined with mean-centering, the data is said to have been standardized. The basic idea is that each feature is presumed to have been drawn from a standard normal distribution with zero mean and unit variance

Mean-centering

Study These Flashcards

In many models, it can be useful to mean-center the data in order to remove certain types of bias effects. Many algorithms in traditional machine learning (such as principal component analysis) also work with the assumption of mean-centered data. In such cases, a vector of column-wise means is subtracted from each data point

Common quadratic loss function and its gradient

Study These Flashcards

Common linear loss function and its gradient

Study These Flashcards

Commonly used matrix calculus identities - numerator layout

Commonly used matrix calculus identities - denominator layout

Commonly used matrix calculus identities - denominator layout (objective functions and vector-to-vector)

Unconstrained quadratic program

Optimality condition and solution to the quadratic program

1-dimensional and multidimensional quadratic functions minimum

An unconstrained quadratic program is a direct generalization of 1-dimensional quadratic functions like 1/2ax² + bx + c. Note that a minimum exists at x = −b/a for 1-dimensional quadratic functions when a \> 0, and a minimum exists for multidimensional quadratic functions when A is positive definite

Squared norm objective function and gradient

Univariate chain rule

Chain rule where one of the functions is a vector-to-scalar function

Least-square regression objective function

Tikhonov Regularization

Normal equation with Tikhonov regularization

Jacobian, Gram matrix and Normal Equation relation

Covariance Matrix of Mean-Centered Data

The unscaled version of the matrix, in which the factor of n is not used in the denominator, is referred to as the scatter matrix. In other words, the scatter matrix is simply D^TD. The scatter matrix is the Gram matrix of the column space of D, whereas the similarity matrix is the Gram matrix of the row space of D