Basics Of Optimization Flashcards

1
Q

Loss function (Objective function)

A

An optimization problem has an objective function that is defined in terms of a set of variables, referred to as optimization variables. The goal of the optimization problem is to compute the values of the variables at which the objective function is either maximized or minimized.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Optimality Conditions in Unconstrained Optimization

A

A univariate function f(x) is a minimum value at x = x0 with respect to its immediate locality if it
satisfies both f
(x0)=0 and f(x0) > 0.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Taylor expansion for evaluating minima

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Jacobian

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Hessian matrix

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Multivariate Taylor expansion for evaluating minima

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Second-order optimality conditions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Convex set

A

A set S is convex, if for every pair of points w1, w2 ∈ S, the point λw1 + [1 − λ]w2 must also be in S for all λ ∈ (0, 1)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Convex set - Visual representation

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Convex function

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Linear functions and convexity

A

A linear function of the vector w is always convex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Second-Derivative Characterization of Convexity

A

The twice differentiable function F(w) is convex, if and only if it has a positive semidefinite Hessian at every value of the parameter w in the domain of F(·)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

The following convexity definitions are equivalent for twice differentiable functions defined over Rd

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

The sum of a convex function and a strictly convex function

A

Is strictly convex

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Finite-difference approximation

A

Is used to verify that the gradient is calculated appropriately

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Decaying learning rate

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

The gradient at the optimal point of a line search

A

Is always orthogonal to the current search direction

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Second-order multivariate Taylor expansion of J(w) in the immediate locality of w0 along the direction v and small radius ε > 0

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Mini-batch stochastic gradient descent

20
Q

Min-max normalization

A

Min-max normalization is useful when the data needs to be scaled in the range (0, 1)

21
Q

Feature normalization

A

A common type of normalization is to divide each feature value by its standard deviation. When this type of feature scaling is combined with mean-centering, the data is said to have been standardized. The basic idea is that each feature is presumed to have been drawn from a standard normal distribution with zero mean and unit variance

22
Q

Mean-centering

A

In many models, it can be useful to mean-center the data in order to remove certain types of bias effects. Many algorithms in traditional machine learning (such as principal component analysis) also work with the assumption of mean-centered data. In such cases, a vector of column-wise means is subtracted from each data point

23
Q

Common quadratic loss function and its gradient

24
Q

Common linear loss function and its gradient

25
Commonly used matrix calculus identities - numerator layout
26
Commonly used matrix calculus identities - denominator layout
27
Commonly used matrix calculus identities - denominator layout (objective functions and vector-to-vector)
28
Unconstrained quadratic program
29
Optimality condition and solution to the quadratic program
30
1-dimensional and multidimensional quadratic functions minimum
An unconstrained quadratic program is a direct generalization of 1-dimensional quadratic functions like 1/2ax2 + bx + c. Note that a minimum exists at x = −b/a for 1-dimensional quadratic functions when a \> 0, and a minimum exists for multidimensional quadratic functions when A is positive definite
31
Squared norm objective function and gradient
32
Univariate chain rule
33
Chain rule where one of the functions is a vector-to-scalar function
34
Least-square regression objective function
35
Tikhonov Regularization
36
Normal equation with Tikhonov regularization
37
Jacobian, Gram matrix and Normal Equation relation
38
Covariance Matrix of Mean-Centered Data
The unscaled version of the matrix, in which the factor of n is not used in the denominator, is referred to as the scatter matrix. In other words, the scatter matrix is simply DTD. The scatter matrix is the Gram matrix of the column space of D, whereas the similarity matrix is the Gram matrix of the row space of D
39
Regularized weight vector
40
Stochastic and mini-batch gradient descent with regularization
41
L2-loss SVM
42
Hinge-loss SVM (L1-loss)
43
Point-wise loss derivatives for the L1 (hinge) and L2 loss
44
Logistic regression loss
45
Mini-batch stochastic gradient-descent for the logistic function
46
Multi-class SVM loss function with regularization
47
Loss function and stochastic gradient descent for the multi-class SVM