Linear and Logistic Regression Flashcards

Question 1

Q

For OLS linear regression on p random variables, what are the input, outcome, action and hypothesis spaces?

Answer

A

IR^p, IR, IR, {w\cdotx + b}

Question 2

Q

What is the OLS loss function?

Question 3

Q

Explain the purpose of the gradient descent algorithm

Answer

A

Lazy algorithm: initialise with some weights and biases, calculate the gradient of R with respect to those, update the weights and biases in the direction of the gradient, continue until some condition is met.

Question 4

Q

Define convexity

Answer

A

A convex function has a chord that is always above itself.

Question 5

Q

For a convex function, what does gradient descent always result in?

Answer

A

convergence to the minimum (providing the step size is small enough)

Question 6

Q

What are the two parameters we must specify before applying gradient descent and what are the potential consequences of setting these too large or too small?

Answer

A

Step size and stopping criteria.

If step size too large, may diverge, too small and may take too long to approach the minimum

Question 7

Q

When can/can’t we use gradient descent?

Answer

A

When our loss is differentiable and our hypothesis space is finite dimensional.

Question 8

Q

Explain mini-batch and stochastic gradient descent and their pros/cons.

Answer

A

Mini-batch: gradient calculated on a random subset of the training data. Stochastic: mini-batch with batch size 1.

Question 9

Q

Explain why feature scaling is important in gradient descent.

Answer

A

If different features have different scales, step size can lead to issues (one step size scale for many scales of weights and biases).

Question 10

Q

For logistic regression on p random variables, what are the input, outcome, action and hypothesis spaces?

Answer

A

Input: IR^p
Output: {0, 1}
Action: (0, 1)
Hypothesis: sigmoid(w.x + b)

Question 11

Q

What is the commonly used loss function for logistic regression?

Question 12

Q

What is the likelihood function?

Answer

A

Probability of seeing our data given our parameters

Question 13

Q

Explain how the maximisation of the likelihood function is equivalent to minimising the log loss function

Answer

A

Maximising the likelihood is the same as maximising the log of the likelihood, use some log laws and we can minimise the log loss.

Linear and Logistic Regression Flashcards

(13 cards)