Mathematics - Lecture 2 Flashcards
Revise the basic mathematical concepts required for the module
What does a matrix in ML represent?
Used to represent features (N), of observations (M)
What is the process of Least Squared Estimation?
- Convert the X and Y data into two column vectors, with X including a column of just ones.
- Calculate, using these, X^T, X^TX and X^TY
- Calculate (X^TX)^-1 i.e. the inverse of the matrix
- Solve for W, in the equation: w = (X^TX)^-1 * X^TY
- Write the linear equation, which is written as: y = (top value * X) +- bottom value
How do you perform the chain rule in Differentiation?
Calculate the derivative in respect to the first part of the equation.
Then, calculate the derivative in regards to the second part of the equation
Then, multiply the two together to create the derivative of the whole equation
How do you calculate the partial derivative of an equation?
Calculate the derivative in respect to the first variable e.g. x - When doing this, the other variable is a constant i.e. unchanged in the calculations
Calculate the derivative in respect to the second variable e.g. y - When doing this, the other variable is a constant i.e. unchanged in the calculations.
What does a vector of partial derivatives equate to?
The gradient of a function, written as [partial derivative 1, partial derivative 2]
Why do we need to optimise a function?
Optimisation can be used to find the best variable value (w) that generates the minimum function value (f(w)), also known as the minimum error.
What is the loss function for MSE (Mean Squared Error)?
MSE = 1/n(n sigma i = 1)(y_i - y’_i)^2
y_i = Actual Value
y’_i = Predicted Value
n = number of samples
When is MSE used?
It’s used within Regression problems, such as predicting house prices, stock prices
What is Gradient Descent Optimisation?
Gradient Descent is an iterative optimisation algorithm used to find a local minimum or maximum of a given function. It is one of the mostly used optimisation methods in Machine Learning.
How do you compute Gradient Descent?
- Define the loss function for the problem at hand e.g. regression problem would be something like MSE
- Calculate the partial derivatives in respect to each variable (a & b)
- Initialise the values of a, b and the learning rate (alpha)
- Calculate the new values of a and b using the previous partial derivatives, by taking the partial derivatives and multiplying them by the learning rate, and taking that away from the respective variable
- Repeat steps 2, 3 and 4 until a convergence has been reached.
What is the difference between the two equations for Mean Squared Error and Sum of Squared Errors?
MSE is the same as SSE, except the equation starts with 1/n on the front.
What are the requirements for gradient descent to be applied?
The function that will be optimised has to be differentiable at each point of x i.e. no discontinuity
The function has to be convex - The line segment connecting two function’s points lay above the curve i.e. does not cross it
What does the term ‘Discrete variable’ mean in regards to probability?
It is a variable whose possible values are numerical outcomes of a random process.
It has countable numbers or states, and the probability distribution is a list of probabilities associated with each of its possible values.
What is the derivative of log_a(x)?
d/dx = 1/(x * ln(a))
What does the conditional probability phrase Pr(X = x|Y = y) translate to?
The probability of the random variable X having a specific value x, given that another random variable Y has a specific value of y.