2 - Inference & Hypothesis Testing Flashcards

Question 1

Q

What happens when you change N (population size) and w (width of band) of a histogram?

Answer

A

As N increases and w decreases, the underlying distribution becomes clearer
As N -> infinity and w -> 0 we get a smooth curve of an underlying probability distribution

Question 2

Q

What is used to define the shape of a normal curve?

Answer

A

sigma squared = population variance (standard deviation of the data set)
mu = population mean (average of the data set)

Question 3

Q

What happens for two N (mu, sigma squared) distributions where sigma squared 1 = sigma squared 2, but mu 1 doesn’t = mu 2?

Answer

A

The density functions will have the same shape but different locations on the x-axis
*As long as SD stays the same, the shape will stay the same, but mean is always in the middle of the curve for a normal distribution, so if mean changes the curve will move left or right

Question 4

Q

What happens for two N (mu, sigma squared) distributions where sigma squared 1 doesn’t = sigma squared 2, but mu 1 = mu 2?

Answer

A

Density functions will have different shapes but the same position on the x-axis
*Mean is the same, so middle of the distribution will be the same but shape will change

Question 5

Q

What is the purpose of each function on a normal distribution (ex: X, mu, sigma squared, and AUC)?

Answer

A

X is distributed as N (mu, sigma squared)
Mu/ mean determines location
Sigma squared determines distribution’s shape (peakedness and spread)
AUC = 1, so a distribution w/ a larger sigma squared will be more spread and lower at the mean

Question 6

Q

Describe the normal distribution empirical rule

Answer

A

For normal distributions

mu +/- sigma contains 68.26% of the observations
mu +/- 2 sigma contains 95.44% of the observations
mu +/- 3 sigma contains 99.74% of the observations

Question 7

Q

What are considered unrepresentative or atypical values for distribution?

Answer

A

Values outside mu +/- 3 sigma (which includes 99.74% of the study population)

Question 8

Q

What are considered somewhat representative values for distribution?

Answer

A

Between representative (typical) and unrepresentative (atypical) values
Fall within mu +/- 2 sigma (which includes 99.5% of the study population)

Question 9

Q

Describe the “line in the sand” for distribution

Answer

A

Unrepresentative is generally accepted as 5% of a set of data
The middle 95% representative and somewhat representative defined by +/- 1.96 sigma

Question 10

Q

What is the z-score?

Answer

A

z = [x - mu] / sigma

x = value from the data set
mu = mean of the data set
sigma = SD of the data set
z = z-score (# of standard deviations above or below the mean for a given x value)
x = mu + sigma * z

Question 11

Q

What is the purpose of a z-score?

Answer

A

To compare data from different data sets (different studies) – gives a common ground for us to make comparisons
If value is negative, that means it is left of the mean; positive is to the right

Question 12

Q

What do you do if you want to know the probability that a normal deviate z might lie between - infinity and a z of 1.96?

Answer

A

Pr (- infinity < z < 1.96) = Pr (0 < z < 1.96) + Pr (- infinity < z < 0) by symmetry
Use 0 to Z table to find Pr (0 < z < 1.96)
For Pr (infinity > z < 0), it constitutes the whole left half of the graph, so that equals 50%

Question 13

Q

What does p < 0.05 mean?

Answer

A

Probability is less than 5%, tells us if something is significant

Question 14

Q

What is the difference between a false positive and false negatives in statistics?

Answer

A

False positive = stats say something is going on when in reality there isn’t
False negative = something is actually happening but stats say there isn’t

Question 15

Q

What is the difference between a type 1 and type 2 error? What could be a cause of each type?

Answer

A

Type 1 = false positive (ex: stats say there is a difference when there really isn’t); could be caused by non-normal data distribution analyzed w/ parametric statistics
Type 2 = false negative (ex: stats say there is no difference when there really is); usually caused by small sample sizes

Question 16

Q

The possibility of statistical significance increases as _____

Answer

Study These Flashcards

A

Sample size increases (w/ a large enough sample, over 2000 people, the smallest difference or correlation is likely to be statistically significant)
Differences between means or strength of correlations increases

Question 17

Q

Describe properties of sampling distributions

Answer

Study These Flashcards

A

Variability of the random sampling distribution depends on the sample size n, and the variability of the population sigma
Larger sample size = smaller sigma / square root of n
Sample means on larger samples are more trustworthy
Smaller variability in population (sigma) = smaller variability we would expect in the sample

Question 18

Q

How can you increase confidence in the estimator?

Answer

Study These Flashcards

A

Use larger sample sizes

- Reduce variability in population by improving the sensitivity of the measurement

Question 19

Q

What is standard error of the mean?

Answer

Study These Flashcards

A

Describes the variability of a sampling distribution
Aka standard deviation of the sampling distribution of means
Standard deviation = variability of individual observations
sigma /x = sigma / square root of n (same units as the data)

Question 20

Q

Define confidence interval

Answer

Study These Flashcards

A

Range of values used to estimate the true value of the population parameter
Probability 1 - alpha (usually expressed as a %) or the proportion of times that the CI actually does contain the population parameter
Establishes the precision (our confidence in) our estimate of mu

Question 21

Q

What happens when alpha decreases?

Answer

Study These Flashcards

A

Confidence increases but precision decreases (widen the CI)

2 - Inference & Hypothesis Testing Flashcards

(21 cards)