Categorical Data Flashcards
What is the response data for categorical data?
- Binary (0’s or 1’s), denoting the presence or absence of some feature/event
- Proportions (which are bounded by 0 and 1)
What is the range for a probability?
Must lie between 0 and 1
If an event will never occur, what probability is it?
Probability of 0
If the probability of an event F occuring is Pr(F), what is the probability of it’s complement, Pr(F with line)
Pr(F with line) = 1 - Pr(F)
What does the probability mass function tell us?
- Gives the probability that a discrete random variable is exactly equal to some value
What does the expected value give us an idea about?
The centre or location of a probability distribution
What does the variance give us an idea about?
The spread of a probability distribution
What happens if the variance is large?
The values of X will vary from the expectation a lot
What does the Binomial distribution characterize?
Binary outcomes for a repeated event
What are the two parameters that a Binomial distribution has?
- fixed number of trials (n)
- probability of a success (p)
X is said to have a binomial distribution if….
- there are only two possible outcomes
- there are a fixed number of trials
- p is constant for all trials
- binomial variable is the total number of successes in n trials
- each trial is independent on other trials
What does the PMF of a binomial distribution give us?
The probability of seeing X successes out of the N trials
Describe the shape of the binomial distribution if p is low
- small number of successes
- distribution is skewed to the right
Describe the shape of the binomial distribution if p is 0.5
- half of the trials are successful
- distribution is symmetrical
Describe the shape of the binomial distribution if p is high
- large number of successes
- distribution is skewed to the left
What is the expected value of the binomial distribution?
np
What is the variance of the binomial distribution?
np(1-p)
What kind of distribution do the sample proportions form about the true population proporiton?
Normally distributed
What is teh standard deviation of the sample proportion?
sqrt(p(1-p)/n)
Describe sampling situation A
The proportions originate from independent samples
Describe sampling situation B
The same sample gives rise to two (or more) proportions where the same individual can only choose one of the options
Describe sampling situation C
The same sample but an individual can choose more than one category
Describe the odds ratio
An odds ratio is a relative measure of effect, whcih allows, for example, the comparison of an intervention group of a study relative to a control, or placebo group
What is the numerator in the odds ratio?
Odds in the intervention arm