Topic 9: Understanding GD: Overparameterisation Flashcards
What is the empirical loss function of linear regression
(Mean squared error)
R^(β) = 1/2 || y - X β||22
What assumptions do we make about the empirical loss function of linear regression
The model is Overparameterised
aka n (no of training data) < d (number of training parameters)
The data matrix X is full rank
What is an Invertible Matrix
(non-singular)
Given A, there exists a A-1
where: AA-1 = Identity matrix
m x n: which is column and row
row by column
What is meant by X has a trivial null space
It has only one element, the zero vector
What is a psuedo inverse
𝑋† = X⊺(XX⊺)^-1
Acts like an inverse in certain respects, even when X does not have a true inverse (is singular or not invertible)
What can we say about XX⊺
It has a trivial null space
It does not map non-zero vectors to zero
It is an invertible matrix
When is the R^(β) = 1/2 || y - X β||22 loss function at a global minima
When
β = X⊺ (XX⊺)^−1 y
How do we express the multiplicity of global minima
|| y - X β||22 = 0 (aka at a global minima)
when
β = X†y + ξ
Where ξ is s.t ξ⊺xi = 0 for all i
What can we say about β = X†y
When ξ = 0
β = X†y is global minima with the least norm
It is the global minima closest to the origin
What is 2-norm
The euclidean distance or standard vector length
What conditions must be met to converge to the global minimum with the least norm
The data matrix X must be full rank
Initially β 0 =0
There must exist a continuum of steps η
What is Implicit Bias
The inherent tendency of machine learning algorithms, particularly neural networks, to prefer certain solutions over others, even when these preferences are not explicitly programmed
What is Algorithmic Regularization
The phenomenon of algorithmic regularization emerges as a consequence of implicit bias
Refers to the regularization effect that involves preventing overfitting or improving generalisation performance
What is β and what are its dimensions
β is the linear predictor
β = d x 1