L5: Logistic Regression Flashcards
In the context of machine learning, “induction” refers to the process of learning patterns, rules, or models from data. It involves generalizing from specific examples to make predictions or decisions about new, unseen data.
TRUE/FALSE
TRUE
What is the problem of induction? How does it relate to training and testing ML models? Select all correct options
A) it can lead to overfitting
B) very general, simple rules might be more robust across time and change
C) complex and detailed models may become outdated as conditions and interdependencies change
All options are correct:
A) it can lead to overfitting
B) very general, simple rules might be more robust across time and change
C) complex and detailed models may become outdated as conditions and interdependencies change
Why is linear regression a form of supervised learning?
You are giving the model “the ground truth”
The term “ground truth” is used in the context of training supervised ML models. During the training phase, the model learns from the input features and their corresponding ground truth labels.
TRUE/FALSE
TRUE
In predictive models, we care less about interpreting the model _____. We just want to know how well the model ______
Fill in the blanks
In predictive models, we care less about interpreting the model COEFFICIENTS. We just want to know how well the model PERFORMS IN PREDICTION
Which of the following statements are true about stratified random sampling?
A) It helps ensure similar representative distributions in train/test sets
B) When subpopulations within an overall population vary, it can be advantageous to sample each subpopulation independently - i.e., the sample incl. representation from each subgroup
C) it helps mitigate the “luck” component when splitting data into testing and training , e.g., that you don’t draw a particularly unlucky/ lucky train set that captures relations that just happened to be in that part of the sample
D) It helps solve dataset imbalance
All options are true
When using K-fold cross-validation, which of the following statements are not true?
A) it splits the training data into K folds, where K-1 is used for training and remainder for testing
B) it is a technique for evaluating predictive model performance
C) the model is trained and evaluated k times, using a different fold as the validation set each time
D) the highest performing fold comprises the model’s generalisation performance
Wrong: D
Performance metrics from each fold are averaged to estimate the model’s generalisation performance
What is the trade-off when choosing number of folds?
The higher the K, the more thoroughly the model is trained and evaluated - higher model precision, but the higher the computational power requirement/ time
What is the purpose of setting “seed”?
Each “seed” represents a random draw. When setting a specific seed in R, it ensures that results are the same every time you use the same data.
In our project, we use the double splitting procedure. Explain what this is.
Hint: has something to do with which part of the dataset K-fold cross validation is performed on
We use k-fold on the training dataset, where the cross-validation takes place in the training dataset alone.
Then, we keep a spare 40% of data as the test set to assess model performance
Which of the following statements are true about the LOOCV cross validation method?
A) stands for leave on out cross validations
B) if n datapoints, the model is trained on n-1 datapoints
C) Model is built using only data from training set
D) Model is then sued to predict the response value of the one observation left out from training set
E) the procedure is repeated n times (n= total number of observations in dataset)
F) every repetition (n times), a different observation is left out of the train set every time
G) the mean squared error is then calculated as the avg. of SME of all the test runs
H) very useful for large datasets
all options a correct except H:
H) very useful for large datasets
Instead, LOOCV exhibits utility for small datasets. For the hotel demand dataset, with +100k observations, this cross validation technique would require +100k repetitions where one datapoint is held out every time. This would take too long
Logistic regression is similar to linear regression. But what contexts render the former more useful?
Logistic regression is similar to linear regression, but used when trying to predict a binary outcome, i.e., for a categorical response variable (1/0 or yes/no)
Standard linear model doesn’t work for probabilities, which are bounded by 0 and 1
TRUE/FALSE
TRUE.
Indeed, it is in these cases that logistic regression is more useful
Explain what “odds” represent
Odds = P/1-P
Odds are a ratio that quantifies the relationship between the likelihood of an event happening and the likelihood of it not happening
what is logit mathematically?
Logarithm of odds =
Log(odds) = Log (P/1-P)
In logistic regression, logit is given by:
Ln(P/(1-P)) = β0+β1 x1+β2 x2+βq xq = log(Odds)
Sigmoid curve represents the graphical relationship between _____ and _____
A) probability of success and logit
B) probability of success and sigmoid function
C) probability of success and odds
A)
Sigmoid curve represents the graphical relationship between probability of success and logit
What shape is the sigmoid curve?
S-shaped, where if probability of success is 0, the logit is negative, and when it is 1, the logit is positive
See. pp. 66 in lecture notes for visualisation
what does the “e” mean in the logistic regression function?
p=1/(1+e^(-(β_0+β_1 x_1+β_2 x_2+β_q x_q ) ) )
“e” is a constant that helps us understand how the probability of an event grows or decays as a function of the model’s input features
What does it mean to say the effect of an independent variable is linear/non-linear?
Select all correction options:
A) it relates to the change in the dependent variable from a change in an independent variable
B) when linear, a one-unit change in IV leads to a constant change in the DV
C) when linear, the coefficients of the independent variables remain constant
D) when non-linear, the change in the DV from a one-unit change is not constant
E) non-linear effects are usually represented graphically by a straight line
A) it relates to the change in the dependent variable from a change in an independent variable
B) when linear, a one-unit change in IV leads to a constant change in the DV
C) when linear, the coefficients of the independent variables remain constant
D) when non-linear, the change in the DV from a one-unit change is not constant
FALSE: E) non-linear effects are usually represented graphically by a straight line
Instead: non-linear effects can take various forms, incl. quadratic, exponential, logarithmic, etc.
Which of the following options constitute reasons for logistic regression being superior in binary classification relative to linear reg.?
A) it models the probability of an event occurring and transforms this probability using the logistic function to ensure predictions fall between 0 and 1
B) it is specifically designed for binary outcomes
C) it provides meaningful and interpretable results in terms of odds ratios and probabilities
D) linear regression is just as good as logistic regression for binary classification
A) it models the probability of an event occurring and transforms this probability using the logistic function to ensure predictions fall between 0 and 1
B) it is specifically designed for binary outcomes
C) it provides meaningful and interpretable results in terms of odds ratios and probabilities
D is wrong
What is a logit and why do we use it?
A) it is a critical component in log. reg.
B) it enables the modelling of the relationship between predictors and the prob. of a binary outcome
C) it transforms probabilities into a linear space, facilitating the estimation of coefficients
D) the coefficients represent the change in the log odds of the outcome for a one-unit change in the given predictor
All options are true
A) it is a critical component in log. reg.
B) it enables the modelling of the relationship between predictors and the prob. of a binary outcome
C) it transforms probabilities into a linear space, facilitating the estimation of coefficients
D) the coefficients represent the change in the log odds of the outcome for a one-unit change in the given predictor
How are odds, probabilities, and log odds related? If an event has a 20% probability of occurring, what is the odds of it occurring?
In log. reg., log odds are modelled linearly with predictor variables. The logit function facilitates the transformation between probabilities, odds, and log odds in a mathematically convenient way.
Transform probability (p) to odds
odds = p/(1-p)