Mathematics - Lecture 2 Flashcards

Revise the basic mathematical concepts required for the module

1
Q

What does a matrix in ML represent?

A

Used to represent features (N), of observations (M)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the process of Least Squared Estimation?

A
  1. Convert the X and Y data into two column vectors, with X including a column of just ones.
  2. Calculate, using these, X^T, X^TX and X^TY
  3. Calculate (X^TX)^-1 i.e. the inverse of the matrix
  4. Solve for W, in the equation: w = (X^TX)^-1 * X^TY
  5. Write the linear equation, which is written as: y = (top value * X) +- bottom value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you perform the chain rule in Differentiation?

A

Calculate the derivative in respect to the first part of the equation.
Then, calculate the derivative in regards to the second part of the equation
Then, multiply the two together to create the derivative of the whole equation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How do you calculate the partial derivative of an equation?

A

Calculate the derivative in respect to the first variable e.g. x - When doing this, the other variable is a constant i.e. unchanged in the calculations
Calculate the derivative in respect to the second variable e.g. y - When doing this, the other variable is a constant i.e. unchanged in the calculations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does a vector of partial derivatives equate to?

A

The gradient of a function, written as [partial derivative 1, partial derivative 2]

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why do we need to optimise a function?

A

Optimisation can be used to find the best variable value (w) that generates the minimum function value (f(w)), also known as the minimum error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the loss function for MSE (Mean Squared Error)?

A

MSE = 1/n(n sigma i = 1)(y_i - y’_i)^2
y_i = Actual Value
y’_i = Predicted Value
n = number of samples

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

When is MSE used?

A

It’s used within Regression problems, such as predicting house prices, stock prices

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is Gradient Descent Optimisation?

A

Gradient Descent is an iterative optimisation algorithm used to find a local minimum or maximum of a given function. It is one of the mostly used optimisation methods in Machine Learning.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How do you compute Gradient Descent?

A
  1. Define the loss function for the problem at hand e.g. regression problem would be something like MSE
  2. Calculate the partial derivatives in respect to each variable (a & b)
  3. Initialise the values of a, b and the learning rate (alpha)
  4. Calculate the new values of a and b using the previous partial derivatives, by taking the partial derivatives and multiplying them by the learning rate, and taking that away from the respective variable
  5. Repeat steps 2, 3 and 4 until a convergence has been reached.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the difference between the two equations for Mean Squared Error and Sum of Squared Errors?

A

MSE is the same as SSE, except the equation starts with 1/n on the front.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the requirements for gradient descent to be applied?

A

The function that will be optimised has to be differentiable at each point of x i.e. no discontinuity
The function has to be convex - The line segment connecting two function’s points lay above the curve i.e. does not cross it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does the term ‘Discrete variable’ mean in regards to probability?

A

It is a variable whose possible values are numerical outcomes of a random process.
It has countable numbers or states, and the probability distribution is a list of probabilities associated with each of its possible values.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the derivative of log_a(x)?

A

d/dx = 1/(x * ln(a))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does the conditional probability phrase Pr(X = x|Y = y) translate to?

A

The probability of the random variable X having a specific value x, given that another random variable Y has a specific value of y.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is the equation for Bayes’ Theorem?

A

P(w_j|x) = (p(x|w_j) * p(w_j))/p(x)

17
Q

What are the differences between Frequentist and Bayesian?

A

Frequentist: Long-term frequency of an event occurring, data driven
Bayesian: Assumes a prior model (knowledge), and update the model using data, model driven