Logistic Regression (6) Flashcards

Question 1

Q

Categorical Independent Variables

Answer

A

• what happens when one of the independent variables of a regression analysis is
categorical?
• eg, you’re working on a land use project evaluating various soil properties, one of
which is “parent rock type”
• or, a regression model for estimating house value that includes whether the house
has an attached garage, a detached garage, or no garage?

• in these cases we introduce a set of dummy variables
• when needed, there will be k – 1 dummy variables, where k is the total number of
categories available
• always remove 1 category – otherwise the dataset will have perfect
multicollinearity and therefore violate the assumptions of the regression analysis
• dummy variables are always binary, 0 or 1, and represent presence or absence of
that phenomenon – if observation 1 is present in category A, then it cannot occur in
any other category

Question 2

Q

• interpretation of the model follows the same logic as interpreting a simple
regression model:

H(price)=131,034+ 38,408(1) + 23,153(0)

Answer

A

• a house with no garage would be $131,034
• a house with an attached garage would be $38,408 more than one without a
garage, or $169,442
• a house with a detached garage would be $23,153 more than one without a garage,
or $154,187

Question 3

Q

Categorical dependent variable

Answer

A

• what happens when the dependent variable of a regression analysis is categorical?
• eg, you’re working on a species distribution project and want to design a predictive
model of whether a species is present or not based on a suite of environmental
conditions
• or, a regression model to determine whether commuters will take the train based
on how long their travel time by car is

Logistic curves are better at integreating categorical dependent vairbales

Question 4

Q

logisitic regression model

Answer

A

.linear model does not work for categorical dependents

.• this requires a new form of regression analysis known as logistic, or logit, regression
• logistic regression does 4 things:
• take the probability of an event/phenomenon occurring for different values of each
independent variable
• take the ratio of those probabilities, which is a continuous, non-negative value
• take the logarithm of that ratio, known as the logit, which is then fitted to the
independent variable(s) using linear regression
• the predicted logit is then converted back into predicted probabilities using an
exponential function
• the result is an estimate of the probability of the dependent variable occurring (ie, the
likelihood of the presence of some phenomenon)

there is no exact equivalent for r
2
in logistic regression – instead there are several
typical values that software will provide for you
.the Cox & Snell r
2
is interpreted similarly to linear regression r
2
, but it’s maximum value
is 0.75 and can change to < 0.5 as n changes
• Nagelkerke r
2
is an attempt to correct the Cox & Snell r
2 so that it has a maximum value
of 1, like the ordinary r
2
– although this is a little more interpretable it is still dependent
on n and may give ambiguous results
• a third option – not given by SPSS – is known as the likelihood ratio as is most analogous
to the ordinary r
2
, representing the difference between the predicted and observed
deviance
• these are all known as pseudo-r
2s

Question 5

Q

• what if the dependent variable is categorical but not binary? – eg, predict the dominant
vegetation type (grassland, tundra, deciduous forest, etc.) based on a number of
independent climate variables

Answer

A

• this is known as multinomial logistic regression
• in this approach, you need to define a reference category, and the regression model
will provide you with probability of occurrence vs that reference category
• eg, a unit increase in temperature is associated with an increase in grassland vs
tundra by 0.07
• multinomial logistic regression is based on maximum likelihood estimations, which
require large datasets

Question 6

Q

• what if the dependent variable is composed of more than 2 ordered categories? – eg,
predict the level of education attained by a person based on a number of socioeconomic
indicators

Answer

A

this is known as ordinal logistic regression
• in this method, each ranked category has a certain probability of occurrence, the
space between each category is equal – the independent variables are constantly
added together until you can cross the gap between individual categories
• like the other forms of logistic regression, this is based in maximum likelihood
estimations which require large sizes
• the coefficients of simple and multiple regression models are estimated using
“ordinary least squares”; in logistic regression, this process is known as
“iteratively reweighted least squares”

Logistic Regression (6) Flashcards

(6 cards)