2 - Inference & Hypothesis Testing Flashcards
What happens when you change N (population size) and w (width of band) of a histogram?
- As N increases and w decreases, the underlying distribution becomes clearer
- As N -> infinity and w -> 0 we get a smooth curve of an underlying probability distribution
What is used to define the shape of a normal curve?
- sigma squared = population variance (standard deviation of the data set)
- mu = population mean (average of the data set)
What happens for two N (mu, sigma squared) distributions where sigma squared 1 = sigma squared 2, but mu 1 doesn’t = mu 2?
- The density functions will have the same shape but different locations on the x-axis
- *As long as SD stays the same, the shape will stay the same, but mean is always in the middle of the curve for a normal distribution, so if mean changes the curve will move left or right
What happens for two N (mu, sigma squared) distributions where sigma squared 1 doesn’t = sigma squared 2, but mu 1 = mu 2?
- Density functions will have different shapes but the same position on the x-axis
- *Mean is the same, so middle of the distribution will be the same but shape will change
What is the purpose of each function on a normal distribution (ex: X, mu, sigma squared, and AUC)?
- X is distributed as N (mu, sigma squared)
- Mu/ mean determines location
- Sigma squared determines distribution’s shape (peakedness and spread)
- AUC = 1, so a distribution w/ a larger sigma squared will be more spread and lower at the mean
Describe the normal distribution empirical rule
For normal distributions
- mu +/- sigma contains 68.26% of the observations
- mu +/- 2 sigma contains 95.44% of the observations
- mu +/- 3 sigma contains 99.74% of the observations
What are considered unrepresentative or atypical values for distribution?
Values outside mu +/- 3 sigma (which includes 99.74% of the study population)
What are considered somewhat representative values for distribution?
- Between representative (typical) and unrepresentative (atypical) values
- Fall within mu +/- 2 sigma (which includes 99.5% of the study population)
Describe the “line in the sand” for distribution
- Unrepresentative is generally accepted as 5% of a set of data
- The middle 95% representative and somewhat representative defined by +/- 1.96 sigma
What is the z-score?
z = [x - mu] / sigma
- x = value from the data set
- mu = mean of the data set
- sigma = SD of the data set
- z = z-score (# of standard deviations above or below the mean for a given x value)
- x = mu + sigma * z
What is the purpose of a z-score?
- To compare data from different data sets (different studies) – gives a common ground for us to make comparisons
- If value is negative, that means it is left of the mean; positive is to the right
What do you do if you want to know the probability that a normal deviate z might lie between - infinity and a z of 1.96?
- Pr (- infinity < z < 1.96) = Pr (0 < z < 1.96) + Pr (- infinity < z < 0) by symmetry
- Use 0 to Z table to find Pr (0 < z < 1.96)
- For Pr (infinity > z < 0), it constitutes the whole left half of the graph, so that equals 50%
What does p < 0.05 mean?
Probability is less than 5%, tells us if something is significant
What is the difference between a false positive and false negatives in statistics?
- False positive = stats say something is going on when in reality there isn’t
- False negative = something is actually happening but stats say there isn’t
What is the difference between a type 1 and type 2 error? What could be a cause of each type?
- Type 1 = false positive (ex: stats say there is a difference when there really isn’t); could be caused by non-normal data distribution analyzed w/ parametric statistics
- Type 2 = false negative (ex: stats say there is no difference when there really is); usually caused by small sample sizes