Data Science using Python and R - 13 Flashcards

1
Q

What are General Linear Models (GLMs)?

A

A family of linear models that includes regression for continuous, numeric discrete, and binary response variables.

GLMs relate different types of response variables through specific link functions.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the linear predictor in a multiple regression model?

A

The sum β0 + β1x1 + β2x2 + ⋯ + βp.

It can be abbreviated as Xβ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the link function in GLMs?

A

A function that connects the linear predictor to the mean μ of the response variable, denoted as g(μ).

Different link functions correspond to different types of regression models.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the identity link function used for?

A

It is used in linear regression when the response variable has a Normal distribution.

In this case, Xβ = g(μ) = μ.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does logistic regression predict?

A

It predicts a binary response variable, such as whether a customer has a store credit card.

The response variable takes values 1 (Yes) or 0 (No).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is the link function for a binary response variable in logistic regression?

A

g(μ) = ln(μ/(1-μ)).

This ensures the mean value μ will always be between 0 and 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the formula to isolate μ in logistic regression?

A

μ = e^(Xβ) / (1 + e^(Xβ)).

This model estimates the probability that y = 1.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How do you interpret the coefficient of a binary predictor variable in logistic regression?

A

It describes the estimated change in the log-odds of the response variable when the predictor variable increases by one.

For example, a coefficient of 1.254 indicates a customer is about 3.5 times more likely to have a store credit card if they have a web account.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What does the coefficient for Days between Purchases indicate in logistic regression?

A

For every additional day between purchases, the customer is 0.4% less likely to have a store credit card.

Multiplying by 30 shows that for every 30 days without a purchase, the likelihood decreases by 11%.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What command is used to perform logistic regression in Python?

A

sm.Logit(y, X).fit().

y is the response variable and X includes predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is Poisson regression used for?

A

It is used to predict a count of events, such as the number of customer service contacts.

The response variable is a count with a minimum value of zero.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is the link function for a count response variable in Poisson regression?

A

g(μ) = ln(μ).

This connects the linear predictor to the mean of the count response.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

How is the Poisson regression model expressed in parametric form?

A

y = e^(β0 + β1x1 + β2x2 + … + βp).

This can also be written in descriptive form.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How do you interpret the coefficient in Poisson regression?

A

When used as the exponent of e, it describes the estimated multiplicative change in the response variable when the predictor increases by one.

For example, a coefficient of 0.4305 increases the predicted number of calls by 53.8% when moving from a non-churning to churning customer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What command is used to perform Poisson regression in Python?

A

sm.GLM(y, X, family=sm.families.Poisson()).fit().

y is the response variable, and X includes predictor variables.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are the three cases of regression response variables discussed in this chapter?

A
  1. Binary response variable
  2. Count response variable
  3. Continuous response variable
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What category of regression models includes all three cases of response variables?

A

Generalized Linear Models (GLM)

18
Q

What do we call the linear predictor?

A

Linear predictor

19
Q

How do we write the linear predictor in its abbreviated form?

20
Q

The link function connects what two things?

A

Linear predictor and response variable

21
Q

How do we write the link function in its abbreviated form?

22
Q

What is the link function for linear regression?

A

Identity link function

23
Q

What kind of regression should we use when trying to predict a binary response variable?

A

Logistic regression

24
Q

What is the link function for logistic regression?

A

Logit link function

25
Q

Are the predicted values from logistic regression probabilities or binary values?

A

Probabilities

26
Q

What is the descriptive form of the logistic regression model?

A

log(p/(1-p)) = β0 + β1X1 + β2X2 + … + βnXn

27
Q

What kind of regression should we use when trying to predict a count response variable?

A

Poisson regression

28
Q

What is the link function for Poisson regression?

A

Log link function

29
Q

What is the descriptive form of the Poisson regression model?

A

log(μ) = β0 + β1X1 + β2X2 + … + βnXn

30
Q

What command is used to fit a Poisson regression model in Python?

A

GLM() command

31
Q

What command is used to view the results of a fitted model in Python?

A

summary() command

32
Q

How do you specify that Poisson regression should be applied to the data in R?

A

family = poisson

33
Q

What is the output of the command poisreg01.summary()?

A

Summary of the Poisson regression model

34
Q

What is the first step to create a logistic regression model in R?

A

Use the glm() command

35
Q

What is the formula input for a Poisson regression model in R?

A

CustServ.Calls ~ Churn

36
Q

What dataset should be used to create a Poisson regression model for customer service calls?

A

churn dataset

37
Q

What should you do after creating a regression model?

A

Obtain the summary of the model

38
Q

Fill in the blank: The command used to create a Poisson regression model in R is _______.

39
Q

True or False: The glm() command can be used for both logistic and Poisson regression.

40
Q

What variable represents the number of customer service calls in the Poisson regression model?

A

CustServ.Calls

41
Q

What is a key output of the summary() command?

A

Details about the fitted model

42
Q

What is the purpose of the exercises listed in the text?

A

To apply statistical modeling techniques using Python or R