2 - Inference & Hypothesis Testing Flashcards

1
Q

What happens when you change N (population size) and w (width of band) of a histogram?

A
  • As N increases and w decreases, the underlying distribution becomes clearer
  • As N -> infinity and w -> 0 we get a smooth curve of an underlying probability distribution
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is used to define the shape of a normal curve?

A
  • sigma squared = population variance (standard deviation of the data set)
  • mu = population mean (average of the data set)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens for two N (mu, sigma squared) distributions where sigma squared 1 = sigma squared 2, but mu 1 doesn’t = mu 2?

A
  • The density functions will have the same shape but different locations on the x-axis
  • *As long as SD stays the same, the shape will stay the same, but mean is always in the middle of the curve for a normal distribution, so if mean changes the curve will move left or right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What happens for two N (mu, sigma squared) distributions where sigma squared 1 doesn’t = sigma squared 2, but mu 1 = mu 2?

A
  • Density functions will have different shapes but the same position on the x-axis
  • *Mean is the same, so middle of the distribution will be the same but shape will change
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the purpose of each function on a normal distribution (ex: X, mu, sigma squared, and AUC)?

A
  • X is distributed as N (mu, sigma squared)
  • Mu/ mean determines location
  • Sigma squared determines distribution’s shape (peakedness and spread)
  • AUC = 1, so a distribution w/ a larger sigma squared will be more spread and lower at the mean
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Describe the normal distribution empirical rule

A

For normal distributions

  • mu +/- sigma contains 68.26% of the observations
  • mu +/- 2 sigma contains 95.44% of the observations
  • mu +/- 3 sigma contains 99.74% of the observations
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What are considered unrepresentative or atypical values for distribution?

A

Values outside mu +/- 3 sigma (which includes 99.74% of the study population)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What are considered somewhat representative values for distribution?

A
  • Between representative (typical) and unrepresentative (atypical) values
  • Fall within mu +/- 2 sigma (which includes 99.5% of the study population)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Describe the “line in the sand” for distribution

A
  • Unrepresentative is generally accepted as 5% of a set of data
  • The middle 95% representative and somewhat representative defined by +/- 1.96 sigma
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is the z-score?

A

z = [x - mu] / sigma

  • x = value from the data set
  • mu = mean of the data set
  • sigma = SD of the data set
  • z = z-score (# of standard deviations above or below the mean for a given x value)
  • x = mu + sigma * z
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the purpose of a z-score?

A
  • To compare data from different data sets (different studies) – gives a common ground for us to make comparisons
  • If value is negative, that means it is left of the mean; positive is to the right
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What do you do if you want to know the probability that a normal deviate z might lie between - infinity and a z of 1.96?

A
  • Pr (- infinity < z < 1.96) = Pr (0 < z < 1.96) + Pr (- infinity < z < 0) by symmetry
  • Use 0 to Z table to find Pr (0 < z < 1.96)
  • For Pr (infinity > z < 0), it constitutes the whole left half of the graph, so that equals 50%
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does p < 0.05 mean?

A

Probability is less than 5%, tells us if something is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What is the difference between a false positive and false negatives in statistics?

A
  • False positive = stats say something is going on when in reality there isn’t
  • False negative = something is actually happening but stats say there isn’t
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What is the difference between a type 1 and type 2 error? What could be a cause of each type?

A
  • Type 1 = false positive (ex: stats say there is a difference when there really isn’t); could be caused by non-normal data distribution analyzed w/ parametric statistics
  • Type 2 = false negative (ex: stats say there is no difference when there really is); usually caused by small sample sizes
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

The possibility of statistical significance increases as _____

A
  • Sample size increases (w/ a large enough sample, over 2000 people, the smallest difference or correlation is likely to be statistically significant)
  • Differences between means or strength of correlations increases
17
Q

Describe properties of sampling distributions

A
  • Variability of the random sampling distribution depends on the sample size n, and the variability of the population sigma
  • Larger sample size = smaller sigma / square root of n
  • Sample means on larger samples are more trustworthy
  • Smaller variability in population (sigma) = smaller variability we would expect in the sample
18
Q

How can you increase confidence in the estimator?

A
  • Use larger sample sizes

- Reduce variability in population by improving the sensitivity of the measurement

19
Q

What is standard error of the mean?

A
  • Describes the variability of a sampling distribution
  • Aka standard deviation of the sampling distribution of means
  • Standard deviation = variability of individual observations
  • sigma /x = sigma / square root of n (same units as the data)
20
Q

Define confidence interval

A
  • Range of values used to estimate the true value of the population parameter
  • Probability 1 - alpha (usually expressed as a %) or the proportion of times that the CI actually does contain the population parameter
  • Establishes the precision (our confidence in) our estimate of mu
21
Q

What happens when alpha decreases?

A

Confidence increases but precision decreases (widen the CI)