Section 2 Logistic Regression Flashcards

1
Q

What is w0 in linear regression

A

Parameter w0 is the intercept allowing for any fixed offset in the data. It is often called bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How are the regression coefficients found in linear regression?

A

Maximisation of the log-likelihood is equivalent to minimization of the squared loss function will lead to the same optimisation.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the motivation for logistic regression?

A

Logistic regression models a binary target categorical variables given a collection of predictors. The motivation for logistic regression is to map R to the 0,1 set to model Pr(Y=1 given the predictor X).Logistic regression provides an interpretable model where can understand how the traits of individual observations contribute to their classification.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

In linear regression what is the error assumption?

A

We assume errors in our linear regression are normal with zero mean and variance sigma

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain classification

A

Predicting the value of Y using inputs of X can be called classification since we assign the observation to a category, or class.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are linear regression models used for

A

Linear regression models relationships between numerical response and multiple predictors using a linear model (real set).This cannot be used for a binary/categorical response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is logistic regression?

A

Logistic regression is used to model a binary categorical variable given a collection predictors. Maps the real numbers to just 0,1. The logistic regression model defines the probability p as a function of parameter and predictors in terms of the logistic function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Interpret w0

A

The intercept w0 is the value of the logit corresponding to xi = 0. The intercept w0 is sometimes called bias.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Interpret W weights

A

The coefficient w measures the effect of the variable on the logit.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

R function to fit a logistic regression

A

glm() to fit logistic regression models. This is because it allows variables forming the regression to be of all different categories.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does R return using glm function?

A

estimates for weights parameters, std errors of these estimates, z value and p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Define a hypothesis test to determine if a variable has a significant effect on the target variable

A

To determine if a variable Xj has a significant effect on the target variable Y , we may wish to perform the hypothesis test:
H0:wj=0 vs HA wj not equal to 0

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Name three estimation options for parameter estimates

A

Maximum likelihood estimation, minimization of least squares loss function or minimising the mean squared error loss function.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the response variable modelled as under logistic regression?

A

Response variable is modelled by a Bernoulli distribution given the values of the input variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

How is wj generally estimated?

A

Maximising the log-likelihood. or minimising loss functions. No closed form solution for wj is available, and optimization is performed numerically and software is used. In the GLM literature, maximisation of the logistic regression log-likelihood is performed using the Newton-Raphson algorithm.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Why would mean squared error loss function not make sense for logistic regression data

A

Mean squared error for a target variable that’s binary doesn’t make sense.

17
Q

Explain information theory

A

Information theory revolves around quantification of information of an event with respect to its likelihood of happening

18
Q

Define entropy

A

Entropy is used to quantify the information over the entire probability distribution P: Entropy allows us to measure and quantify the variability of a categorical variable, by giving a measure of how spread out the probability values are.

19
Q

Define cross entropy

A

Suppose P denotes a probability distribution of interest, while Q a probability distribution used to estimate P.
Cross-entropy measures the expected “surprisal” of an observer with probabilities Q after seeing data actually generated according to probabilities P (Replacing set of probabilities by another from a different distribution to quantify the difference between the two distributions):

20
Q

What is the gradient of a loss function

A

The gradient ∇l(w: D) of the loss function is the vector of all partial derivatives

21
Q

Explain gradient descent concept

A

Using the information from the gradient, a first-derivative-based algorithm can be devised to efficiently locate a local minimum by moving towards the direction of the negative gradient, as its always “downhill”. This iterative optimization is called gradient descent.

22
Q

What is eta in gradient descent optimisation algorithm

A

The learning rate η determines the size of the step and is usually set to be small: If steps are too big you risk missing the optimal point, if too small optimization will take a long time. Steps are not necessarily same very time but will be proportional to η

23
Q

When does gradient descent algorithm converge

A

The gradient descent algorithm converges when all the elements of the gradient are (numerically) zero.

24
Q

Explain concept of complete separation

A

Complete separation in logistic regression.A complete separation in a logistic regression, sometimes also referred as perfect prediction, happens when the outcome variable separates a predictor variable completely.

Logistic regression tries to fit a sigmoid curve to the data.
The coefficient w measures the slope of the curve. As the value of w increases to ∞, we get a better fit to completely separated data (clear vertical line/upward sloping line between the two data sections. )

25
Q

How can we recognize complete separate of data

A

Software will output an arbitrarily large parameter estimate with a very large standard error and throw a warning .
OR
if in gradient descent algorithm we are getting the sigmoid function to be 1 or 0 and the algorithm gets stuck

26
Q

How can one address problem of complete separation of data

A

Regularised logistic regression could be used to address the problem. It adds an extra term, to avoid the convergence in gradient descent getting stuck. See below.

27
Q

What are the consequences of complete separation of data

A

Note that this represents a problem for inference on the coefficients, while it might not constitute a problem if the main purpose is classification.
Always check as if there is complete separate you cannot make inference on coefficients.

28
Q

What is logistic function saturation

A

This warning simply states that some of the estimated probabilities are numerically equal to 0 or 1.
This happens when the logistic function “saturates”, that is when it attains numerically the boundary values (asymptotes) of 0 or 1. - means argument of the function is such large in magnitude that the value in output is numerically indistinguishable from 0 or 1

29
Q

What could cause logistic function saturation? and what are the consequences?

A

Could be due to: large coefficient magnitude values, large observed xij magnitude values, or a combination.
If due to large x ij values software will throw a warning but inference on estimated coefficients may still be valid.

30
Q

What is multinomial logistic regression

A

Multinomial logistic regression is used for multi-class classification.
Suppose that Y can take K attributes/classes/categories
The model is formulated using the softmax function.

31
Q

How do the weights work in multinomial logistic regression?

A

You will have a set of biases w0k and weights wk for each class. Vector w for each k.

32
Q

How are parameters optimized for multinomial regression?

A

Like for logistic regression, optimization and estimation of parameters can be implemented using gradient descent.

33
Q

What issues can arise with multinomial regression?

A

In multinomial logistic regression complete separation and saturation can happen but separation can be more difficult to see. It’s very rare. Generally everything is intertwined and no two components are separated.

34
Q

Error thrown for complete separation in R

A

Warning:
1. glm.fit: algorithm did not converge
2: glm.fit: fitted probabilities numerically 0 or 1 occurred.

35
Q
A