lecture 6: ridge regression and polynomial regression Flashcards
learning of a vectored function is the same as a scalar function apart from
the output of w, it is a matrix instead of a vector
using matrix notation, what is the sum of squared error cost function
E = WX - Y
capital letters represent matrices
the formulas for W are the same as those for w, TRUE or FALSE
TRUE
recap: what type of problem does it correspond to when yᵢ is continuous valued vs discrete valued
regression for continuous and classification for discrete
what are the 2 linear methods for classification
binary classification and multi-category classification
what are the 2 classifications for binary classification?
yᵢ ∈ {-1,1}
for the value(s) of y derived, take the sign as the answer
what is the method of assignment used for multi-category classification
one-hot encoding
with the final y matrix obtained, how do we classify each item?
using argmax, for each row the column with the largest number determines the class label if the largest number is in column 1, item is class 1
why do we use ridge regression?
we cannot guarantee that XᵀX is invertible, ridge regression ensures that whatever is in the bracket is invertible by adding an identity matrix with a minimised coefficient 𝜆
what is the term added for minimisation of the coefficient of identity matrix 𝜆
𝜆wᵀw
ridge regression in primal form
hint: similar to over determined system
w = (XᵀX + 𝜆I)⁻¹ Xᵀy
ridge regression in dual form
hint: similar to under determined system
w = Xᵀ(XXᵀ + 𝜆I)⁻¹ y
why do we use polynomial regression
to try and get a better fit for the data
generally, for high dimensional problems, polynomials of order larger than what is seldom used
3