Categorical Data Flashcards
What is the response data for categorical data?
- Binary (0’s or 1’s), denoting the presence or absence of some feature/event
- Proportions (which are bounded by 0 and 1)
What is the range for a probability?
Must lie between 0 and 1
If an event will never occur, what probability is it?
Probability of 0
If the probability of an event F occuring is Pr(F), what is the probability of it’s complement, Pr(F with line)
Pr(F with line) = 1 - Pr(F)
What does the probability mass function tell us?
- Gives the probability that a discrete random variable is exactly equal to some value
What does the expected value give us an idea about?
The centre or location of a probability distribution
What does the variance give us an idea about?
The spread of a probability distribution
What happens if the variance is large?
The values of X will vary from the expectation a lot
What does the Binomial distribution characterize?
Binary outcomes for a repeated event
What are the two parameters that a Binomial distribution has?
- fixed number of trials (n)
- probability of a success (p)
X is said to have a binomial distribution if….
- there are only two possible outcomes
- there are a fixed number of trials
- p is constant for all trials
- binomial variable is the total number of successes in n trials
- each trial is independent on other trials
What does the PMF of a binomial distribution give us?
The probability of seeing X successes out of the N trials
Describe the shape of the binomial distribution if p is low
- small number of successes
- distribution is skewed to the right
Describe the shape of the binomial distribution if p is 0.5
- half of the trials are successful
- distribution is symmetrical
Describe the shape of the binomial distribution if p is high
- large number of successes
- distribution is skewed to the left
What is the expected value of the binomial distribution?
np
What is the variance of the binomial distribution?
np(1-p)
What kind of distribution do the sample proportions form about the true population proporiton?
Normally distributed
What is teh standard deviation of the sample proportion?
sqrt(p(1-p)/n)
Describe sampling situation A
The proportions originate from independent samples
Describe sampling situation B
The same sample gives rise to two (or more) proportions where the same individual can only choose one of the options
Describe sampling situation C
The same sample but an individual can choose more than one category
Describe the odds ratio
An odds ratio is a relative measure of effect, whcih allows, for example, the comparison of an intervention group of a study relative to a control, or placebo group
What is the numerator in the odds ratio?
Odds in the intervention arm
What is the denominator in the odds ratio?
Odds in the control arm
What can you say if the OR is greater than 1?
The intervention is better than the control
What can you say if the OR is less than 1?
The control is better than the intervention
What are the odds of success defined to be?
p(success)/p(failure) so p/1-p
What is the probability of success from the odds?
odds/odds + 1
Why do we use the logs of odds ratio as the centre of the associated confidence intervals?
Sampling distribution for the odds ratio is highly skewed
Logs tend to be more symmetrical
What is the chi square distribution indexed by?
The degrees of freedom
What is the null hypothesis for a chi square distribution?
H0 : expected count = total x specified cell probability
What are the degrees of freedom for chi square goodness of fit test?
number of categories - 1
What are the two types of chi square tests?
- Goodness of fit
- Test for independence
What is the expected count for the chi square test for independence?
(row total x column total)/(grand total)
What are the degrees of freedom for the independence chi squared distrbution?
(number of columns - 1) x (number of rows - 1)
What are the rules of thumb for chi squared test?
- no more than 20% of the expected counts are less than 5
- all expected counts are one or greater