Data Science using Python and R - 13 Flashcards
What are General Linear Models (GLMs)?
A family of linear models that includes regression for continuous, numeric discrete, and binary response variables.
GLMs relate different types of response variables through specific link functions.
What is the linear predictor in a multiple regression model?
The sum β0 + β1x1 + β2x2 + ⋯ + βp.
It can be abbreviated as Xβ.
What is the link function in GLMs?
A function that connects the linear predictor to the mean μ of the response variable, denoted as g(μ).
Different link functions correspond to different types of regression models.
What is the identity link function used for?
It is used in linear regression when the response variable has a Normal distribution.
In this case, Xβ = g(μ) = μ.
What does logistic regression predict?
It predicts a binary response variable, such as whether a customer has a store credit card.
The response variable takes values 1 (Yes) or 0 (No).
What is the link function for a binary response variable in logistic regression?
g(μ) = ln(μ/(1-μ)).
This ensures the mean value μ will always be between 0 and 1.
What is the formula to isolate μ in logistic regression?
μ = e^(Xβ) / (1 + e^(Xβ)).
This model estimates the probability that y = 1.
How do you interpret the coefficient of a binary predictor variable in logistic regression?
It describes the estimated change in the log-odds of the response variable when the predictor variable increases by one.
For example, a coefficient of 1.254 indicates a customer is about 3.5 times more likely to have a store credit card if they have a web account.
What does the coefficient for Days between Purchases indicate in logistic regression?
For every additional day between purchases, the customer is 0.4% less likely to have a store credit card.
Multiplying by 30 shows that for every 30 days without a purchase, the likelihood decreases by 11%.
What command is used to perform logistic regression in Python?
sm.Logit(y, X).fit().
y is the response variable and X includes predictor variables.
What is Poisson regression used for?
It is used to predict a count of events, such as the number of customer service contacts.
The response variable is a count with a minimum value of zero.
What is the link function for a count response variable in Poisson regression?
g(μ) = ln(μ).
This connects the linear predictor to the mean of the count response.
How is the Poisson regression model expressed in parametric form?
y = e^(β0 + β1x1 + β2x2 + … + βp).
This can also be written in descriptive form.
How do you interpret the coefficient in Poisson regression?
When used as the exponent of e, it describes the estimated multiplicative change in the response variable when the predictor increases by one.
For example, a coefficient of 0.4305 increases the predicted number of calls by 53.8% when moving from a non-churning to churning customer.
What command is used to perform Poisson regression in Python?
sm.GLM(y, X, family=sm.families.Poisson()).fit().
y is the response variable, and X includes predictor variables.
What are the three cases of regression response variables discussed in this chapter?
- Binary response variable
- Count response variable
- Continuous response variable
What category of regression models includes all three cases of response variables?
Generalized Linear Models (GLM)
What do we call the linear predictor?
Linear predictor
How do we write the linear predictor in its abbreviated form?
η
The link function connects what two things?
Linear predictor and response variable
How do we write the link function in its abbreviated form?
g(μ)
What is the link function for linear regression?
Identity link function
What kind of regression should we use when trying to predict a binary response variable?
Logistic regression
What is the link function for logistic regression?
Logit link function
Are the predicted values from logistic regression probabilities or binary values?
Probabilities
What is the descriptive form of the logistic regression model?
log(p/(1-p)) = β0 + β1X1 + β2X2 + … + βnXn
What kind of regression should we use when trying to predict a count response variable?
Poisson regression
What is the link function for Poisson regression?
Log link function
What is the descriptive form of the Poisson regression model?
log(μ) = β0 + β1X1 + β2X2 + … + βnXn
What command is used to fit a Poisson regression model in Python?
GLM() command
What command is used to view the results of a fitted model in Python?
summary() command
How do you specify that Poisson regression should be applied to the data in R?
family = poisson
What is the output of the command poisreg01.summary()?
Summary of the Poisson regression model
What is the first step to create a logistic regression model in R?
Use the glm() command
What is the formula input for a Poisson regression model in R?
CustServ.Calls ~ Churn
What dataset should be used to create a Poisson regression model for customer service calls?
churn dataset
What should you do after creating a regression model?
Obtain the summary of the model
Fill in the blank: The command used to create a Poisson regression model in R is _______.
glm()
True or False: The glm() command can be used for both logistic and Poisson regression.
True
What variable represents the number of customer service calls in the Poisson regression model?
CustServ.Calls
What is a key output of the summary() command?
Details about the fitted model
What is the purpose of the exercises listed in the text?
To apply statistical modeling techniques using Python or R