Last part Flashcards
(47 cards)
logistic regression
used mainly when DV is NOT A # – used to predict categories
basic types of logistic regression:
logistic or probabilistic: binary outcomes
multinomial: discrete outcomes, not in relation to each other (categorical)
ordered: discrete oucomes on a scale in relation to each other
- poor, fair, excellent
not logistic: count-based outomes, even though they are whole numbers
discrete variable
A variable that can take on specific, separate values — usually whole numbers — and nothing in between.
ex: # of children, # of protests
used in: OLS, logistic, ordered logistic
continuous variable
can take on any #
ex: voter turnout %
logistic or probabilistic regression
2 options (yes/no)
OLS reg does not work with this bc distribution is entirely 0 to 1
logistic v. linear regression
logistic = binary DV
- OUTPUT: probability or log-odds
- Use logistic when outcome is yes/no OR Categories
- -Change in coefficients: change in log odds/odds ratio
linear = continuous DV
- OUTPUT: raw value
- use logis
- probability #
- use linear when outcome is a #
- -Change in coefficients: change in DV
- Probability between 0 and 1
binary outcomes
any DV that takes on two outcomes
Distribution curve is 0 or 1, so equation for OLS regerssion doesnt work
output can be read the exact same as a linear regression
ORDINARY LEAST SQUARES
Most common linear regression, used when DV is continuous
uses normal basic equation of a regression
OLS finds the best-fitting line through the data by minimizing the squared distance between the actual values and the predicted values.
That’s why it’s called least squares — it minimizes the sum of squared errors!
use of logarithms
logarithm = inverse of an exponent
help us assess curves like we see in binary data
taken using exponents. the base with an exponent of x equals x
natural logarithm
NATURAL LOG is base “e” (an irrational derivative) bc elnx = x
e^{\ln x}=x
It’s log base e, where e ≈ 2.71828.
It’s called “natural” because it arises naturally in calculus, growth processes, and probability models.
**Natural logarithms are used to linearize exponential relationships, model log-odds, handle skewed data, and measure growth rates — they’re essential in regression and probability modeling.
**
why is the exponent negative? why is it sometimes written as a positive
negative exponent constrains value between 0 and 1
why isnt the y intercept constrained to 0 and 1
bc coefficients in logistic models are read kind of differently
IN A REGRESSION TABLE: INTERCEPT
Log-odds of segregation when all Xs = 0 (not directly interpretable)
IN A REGRESSION TABLE: DV
First thing after formula = (dv)
negative exponents
use these to constrain output between 0 and 1 (valid probabiliites)
y-intercept
NOTE: Linear y-intercept = value of Y when all Xs are 0. in Logistic, remember we are predicting log-odds, NOT raw numbers.
y intercept is NEGATIVE of location parameter divided by rate parameter
location parameter / rate parameter
(location parameter = point at which the porbability = 0.5 //midpoint basically)
(rate parameter = tells you how fast something happens)
why is y-intercept between 0 and 1
bc the regression is in log-odds, not raw probabilities
odds ratios
ratio of an outcome relative to its alternative
AKA likelihood of y = 1 divided by likelihood of y = 0
when reading coefficients for a logistic regression…
coefficients are listed as odds ratios
read in relation to 1 NOT 0 – so lower than 1 is negative, higher than 1 is positive
IYENGAR and WESTWOOD (2014)
differentiates between policy, identity and affective polarization
identity polarization: alignment with others based on party affiliation, not policy
affective poliarzation: hostility towards other members of other political parties
WHEN TO USE LOGISTIC REGRESSION
ALL ABOUT DISTRIBUTION!
- binary distributions dont follow OLS
!!!!Standard errors and Test Statistics work the same way – distance to line!!!!
what to do about categorical and ordinal variables
need to adjust the logistic regression….called multinomial and ordered logistic regression
categorical DV
Takes on several discrete, non-ordered categories
outcomes not assessed in relation to each other
in formula, K is the baseline outcome
ordinal DV
each category is ordered, so PROBABILITIES are SUCCESSIVE
each outcome hsas to be compared in relation to other outcomes.
on a 5-point LIKERT scale (very poor, poor, fair, good, very good)