Linear and Logistic Regression Flashcards

1
Q

For OLS linear regression on p random variables, what are the input, outcome, action and hypothesis spaces?

A

IR^p, IR, IR, {w\cdotx + b}

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the OLS loss function?

A

SE loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Explain the purpose of the gradient descent algorithm

A

Lazy algorithm: initialise with some weights and biases, calculate the gradient of R with respect to those, update the weights and biases in the direction of the gradient, continue until some condition is met.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Define convexity

A

A convex function has a chord that is always above itself.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

For a convex function, what does gradient descent always result in?

A

convergence to the minimum (providing the step size is small enough)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are the two parameters we must specify before applying gradient descent and what are the potential consequences of setting these too large or too small?

A

Step size and stopping criteria.

If step size too large, may diverge, too small and may take too long to approach the minimum

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When can/can’t we use gradient descent?

A

When our loss is differentiable and our hypothesis space is finite dimensional.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Explain mini-batch and stochastic gradient descent and their pros/cons.

A

Mini-batch: gradient calculated on a random subset of the training data. Stochastic: mini-batch with batch size 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Explain why feature scaling is important in gradient descent.

A

If different features have different scales, step size can lead to issues (one step size scale for many scales of weights and biases).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

For logistic regression on p random variables, what are the input, outcome, action and hypothesis spaces?

A

Input: IR^p
Output: {0, 1}
Action: (0, 1)
Hypothesis: sigmoid(w.x + b)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the commonly used loss function for logistic regression?

A

log loss

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the likelihood function?

A

Probability of seeing our data given our parameters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Explain how the maximisation of the likelihood function is equivalent to minimising the log loss function

A

Maximising the likelihood is the same as maximising the log of the likelihood, use some log laws and we can minimise the log loss.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly