Mathematics - Lecture 2 Flashcards

Question 1

Q

What does a matrix in ML represent?

Answer

A

Used to represent features (N), of observations (M)

Question 2

Q

What is the process of Least Squared Estimation?

Answer

A

Convert the X and Y data into two column vectors, with X including a column of just ones.
Calculate, using these, X^T, X^TX and X^TY
Calculate (X^TX)^-1 i.e. the inverse of the matrix
Solve for W, in the equation: w = (X^TX)^-1 * X^TY
Write the linear equation, which is written as: y = (top value * X) +- bottom value

Question 3

Q

How do you perform the chain rule in Differentiation?

Answer

A

Calculate the derivative in respect to the first part of the equation.
Then, calculate the derivative in regards to the second part of the equation
Then, multiply the two together to create the derivative of the whole equation

Question 4

Q

How do you calculate the partial derivative of an equation?

Answer

A

Calculate the derivative in respect to the first variable e.g. x - When doing this, the other variable is a constant i.e. unchanged in the calculations
Calculate the derivative in respect to the second variable e.g. y - When doing this, the other variable is a constant i.e. unchanged in the calculations.

Question 5

Q

What does a vector of partial derivatives equate to?

Answer

A

The gradient of a function, written as [partial derivative 1, partial derivative 2]

Question 6

Q

Why do we need to optimise a function?

Answer

A

Optimisation can be used to find the best variable value (w) that generates the minimum function value (f(w)), also known as the minimum error.

Question 7

Q

What is the loss function for MSE (Mean Squared Error)?

Answer

A

MSE = 1/n(n sigma i = 1)(y_i - y’_i)^2
y_i = Actual Value
y’_i = Predicted Value
n = number of samples

Question 8

Q

When is MSE used?

Answer

A

It’s used within Regression problems, such as predicting house prices, stock prices

Question 9

Q

What is Gradient Descent Optimisation?

Answer

A

Gradient Descent is an iterative optimisation algorithm used to find a local minimum or maximum of a given function. It is one of the mostly used optimisation methods in Machine Learning.

Question 10

Q

How do you compute Gradient Descent?

Answer

A

Define the loss function for the problem at hand e.g. regression problem would be something like MSE
Calculate the partial derivatives in respect to each variable (a & b)
Initialise the values of a, b and the learning rate (alpha)
Calculate the new values of a and b using the previous partial derivatives, by taking the partial derivatives and multiplying them by the learning rate, and taking that away from the respective variable
Repeat steps 2, 3 and 4 until a convergence has been reached.

Question 11

Q

What is the difference between the two equations for Mean Squared Error and Sum of Squared Errors?

Answer

A

MSE is the same as SSE, except the equation starts with 1/n on the front.

Question 12

Q

What are the requirements for gradient descent to be applied?

Answer

A

The function that will be optimised has to be differentiable at each point of x i.e. no discontinuity
The function has to be convex - The line segment connecting two function’s points lay above the curve i.e. does not cross it

Question 13

Q

What does the term ‘Discrete variable’ mean in regards to probability?

Answer

A

It is a variable whose possible values are numerical outcomes of a random process.
It has countable numbers or states, and the probability distribution is a list of probabilities associated with each of its possible values.

Question 14

Q

What is the derivative of log_a(x)?

Answer

A

d/dx = 1/(x * ln(a))

Question 15

Q

What does the conditional probability phrase Pr(X = x|Y = y) translate to?

Answer

A

The probability of the random variable X having a specific value x, given that another random variable Y has a specific value of y.

Question 16

Q

What is the equation for Bayes’ Theorem?

Answer

Study These Flashcards

A

P(w_j|x) = (p(x|w_j) * p(w_j))/p(x)

Question 17

Q

What are the differences between Frequentist and Bayesian?

Answer

Study These Flashcards

A

Frequentist: Long-term frequency of an event occurring, data driven
Bayesian: Assumes a prior model (knowledge), and update the model using data, model driven

Mathematics - Lecture 2 Flashcards

Revise the basic mathematical concepts required for the module (17 cards)