final exam Flashcards
Is a p value of .50 statistically significant at p <. 05?
no
Is a p value of .51 statistically significant at p < .05?
no
Is a t statistic of 1.97 statistically significant at p < .05 (two tails) with a sample size of infinity?
yes
What is the two tailed t value for p < .05 for a sample size of infinity?
1.96
Which t statistic has a larger p value?
* t = 2.34
* t = 1.96
1.96
Would any of these t values be statistically significant at p < .05 at any sample size, under any circumstance?
* t = .50
* t = -1.50
* t = 1.88
1.88 could be “statistically significant” using a one tailed test at p < .05 if degrees of freedom are equal to or greater than 8.
Is a correlation of r = -.45 in sample of N = 150 statistically significant at p < .05 (two tails)?
yes
What is the margin of error for a 95% confidence interval around a sample mean of 45.35, where the standard deviation is 18.53, in a sample of N = 20 (use t not Z)?
8.67
What is the 95% confidence interval for the mean (sample mean of 45.35, where the standard deviation is 18.53, in a sample of N = 20 (use t not Z) Margin of error= 8.67) ?
36.68, 54.02
What is the margin of error for a 95% confidence interval around a sample mean of 63.27, where the standard deviation is 4.97, in a sample of N = 30?
1.86
What is the 95% confidence interval for the above mean (a sample mean of 63.27, where the standard deviation is 4.97, in a sample of N = 30, margin of error= 1.86)?
61.41, 65.13
Identify the factors that result in wider confidence intervals.
High levels of confidence, larger standard deviations, smaller sample sizes.
What are some correct interpretations of a 95% CI of [61.41, 65.13] for a mean of 63.27?
Values between 61.41, 65.13 are plausible population means. B) If we were to repeat the study over and over, then 95 % of the time the confidence intervals contain the true mean.
Is a mean of 60.96 from a sample of N =16, standard deviation = 6.59, statistically significantly different than a value of 50?
Yes p < .01
Is a correlation of r = .62 from a sample size of N = 50 statistically significant?
Yes p < .01
Is an correlation of r = -.12, 95% CI [-.35, .20] statistically significant?
no
Is the correlation between age and salary (r = .30, 95% CI [.27,.33]) significantly different from the correlation between height and happiness (r = .22, 95% CI [.17,.27])?
yes
The null hypothesis (H0) is disproved.
false
The probability that the null hypothesis is true is < .05
false
The alternative hypothesis (H1) is proved.
false
The probability that the alternative hypothesis is true is > .95
false
The probability that a Type I error was made just in rejecting H0 is < .05
false
The probability that the same result will be found in a replication study is > .95
false
The probability that the result is due to chance is < .05.
false
What is sampling error?
Sampling error is the difference between the results from a sample and the results you would get if the entire population was surveyed.
What effect sampling error have on data?
It can lead to inaccurate generalizations about the population. Larger, random samples tend to reduce sampling error.
What is the difference between a population and a sample in statistics?
A population includes all members of a group being studied, while a sample is a subset of that population used to make inferences.
What is a confidence interval and what does it tell us?
A confidence interval gives a range of values likely to contain a population parameter, based on a sample statistic.
Example: A 95% confidence interval means we’re 95% confident the true value falls within the range.
What is standard error, and how is it estimated?
Standard error (SE) measures how much a sample statistic (like the mean) is expected to vary from sample to sample. It estimates the precision of the sample as an estimate of the population.
What is the formula for standard error?
For the mean:
SE=s/√ n
Where s is the sample standard deviation and n is the sample size.
What is the Central Limit Theorem (CLT)?
The Central Limit Theorem states that the distribution of sample means approaches a normal distribution as the sample size increases, regardless of the population’s distribution—as long as the samples are independent and random.
It’s the reason we can use normal-based methods for inference.
What is the difference between standard deviation and standard error?
- Standard deviation (SD): Measures the variability within a single sample or population.
- Standard error (SE): Measures how much the sample statistic would vary between different samples.
- SE gets smaller with larger sample sizes; SD does not.
What is margin of error, and how is it calculated?
Margin of error (MOE) indicates the range within which we expect the true population parameter to fall, based on the sample estimate.
It accounts for sampling variability.
Formula for Margin of Error
MOE=z ×SE
Where z is the critical value (from z-table) and SE is the standard error.
What is a p-value in hypothesis testing?
A p-value is the probability of obtaining results at least as extreme as the observed results, assuming the null hypothesis is true.
A low p-value (< 0.05, usually) suggests strong evidence against the null hypothesis.
A high p-value suggests weak evidence against the null.
What are Type I and Type II errors?
- Type I error (α): Rejecting a true null hypothesis (false positive).
- Type II error (β): Failing to reject a false null hypothesis (false negative).
- Reducing one often increases the other—balance is key.
What is a normal distribution?
A normal distribution is a symmetric, bell-shaped distribution centered around the mean. It’s defined by its mean (μ) and standard deviation (σ).
What are the properties of normal distribution?
- Symmetrical around the mean
- Mean = Median = Mode
- ~68% of data within 1σ, 95% within 2σ, 99.7% within 3σ
- The total area under the curve = 1
What is a z-score?
A z-score tells how many standard deviations a value is from the mean.
Formula for z score
z= (x−μ)/ σ
Used to compare scores across different normal distributions or assess probabilities.
What is the empirical rule (68-95-99.7 rule)?
The empirical rule describes how data is distributed in a normal distribution:
- ~68% of data falls within 1 standard deviation of the mean
- ~95% within 2 standard deviations
- ~99.7% within 3 standard deviations
How are confidence intervals correctly interpreted?
“We are [e.g., 95%] confident that the true population parameter lies within this range.”
The interval reflects the uncertainty in estimating the population parameter.
What does “confidence” refer to in a confidence interval?
Confidence refers to the long-term success rate of the method used to create the interval.
For example, with a 95% confidence interval, we expect that 95% of intervals constructed from repeated sampling would capture the true population parameter.
What is the distinction between confidence and probability?
- Confidence refers to the long-term reliability of the interval estimation method (after the interval is calculated).
- Probability refers to the likelihood of an event before it occurs.
- Once the interval is calculated, we do not use probability to describe it — the parameter is either in the interval or not.
How is a confidence interval interpreted with respect to probability?
Confidence intervals are not about the probability of the parameter lying in the interval. Instead, they express the confidence level of the method.
For example, with a 95% confidence interval, we are 95% confident that the true parameter is within the interval. However, after the interval is calculated, there is no probability about the true value’s position—it’s either in the interval or not.
What is a t-distribution?
A t-distribution is a probability distribution used when the sample size is small and/or the population standard deviation is unknown. It’s similar to the normal distribution but has heavier tails
When is a t distribution used?
Used in t-tests and confidence intervals when the sample size is < 30 and population variance is unknown.
What is the relationship between a t-distribution and the normal distribution?
- The t-distribution becomes closer to the normal distribution as sample size increases.
- With small sample sizes, the t-distribution has wider tails (more probability in the extremes), reflecting higher uncertainty.
- As the sample size grows, the t-distribution approaches the normal distribution.
What are degrees of freedom (df)?
Degrees of freedom (df) refer to the number of independent values that can vary in a statistical calculation.
How do degrees of freedom affect the t distribution?
- For a one-sample t-test, df = n−1, where n is the sample size.
- Lower df results in wider tails in the t-distribution (increased uncertainty), and higher df leads to a distribution closer to the normal curve.
What is statistical significance?
Statistical significance means that the observed effect in your data is unlikely to have occurred by random chance. It’s usually determined by a p-value that is compared to a significance level (α).
What is a p-value?
A p-value is the probability of obtaining results as extreme as the observed data, assuming the null hypothesis is true.
How is p value used to determine statistical significance?
- If the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis and claim statistical significance.
- A high p-value means the evidence is weak against the null hypothesis.
What is the relationship between statistical significance and the null hypothesis?
- When a result is statistically significant, it means the data provides strong enough evidence to reject the null hypothesis.
- The null hypothesis usually represents a statement of no effect or no difference.
- If the p-value is lower than the significance level (e.g., 0.05), we reject the null hypothesis in favor of the alternative hypothesis.
What is the null hypothesis?
The null hypothesis (H₀) is a statement of no effect, no difference, or no relationship between variables. It serves as the starting point for statistical testing.
How is the null hypothesis tested?
The null hypothesis is tested using statistical tests (like t-tests or ANOVAs) to determine if there is enough evidence to reject it.
- If the p-value is smaller than the significance level (e.g., 0.05), we reject the null hypothesis and conclude that there is significant evidence against it.
- If the p-value is larger than the significance level, we fail to reject the null hypothesis (it’s not proven false).
What does it mean to “fail to reject” the null hypothesis?
“Failing to reject” the null hypothesis means that the sample data did not provide strong enough evidence to support the alternative hypothesis.
What is a p-value?
A p-value is the probability of obtaining results at least as extreme as the ones observed, assuming the null hypothesis is true.
- A smaller p-value indicates stronger evidence against the null hypothesis.
- Common threshold: If p-value < 0.05, we typically reject the null hypothesis.
How is a p-value used to make decisions in hypothesis testing?
The p-value helps determine whether the observed data is statistically significant:
What does a p-value of 0.03 mean in the context of hypothesis testing?
A p-value of 0.03 means that if the null hypothesis were true, there is a 3% chance of observing results as extreme as the ones seen in your sample.
What is the significance level (α) in hypothesis testing?
The significance level (α) is the threshold used to decide whether the p-value is small enough to reject the null hypothesis.
How is the significance level (α) related to Type I errors?
The significance level (α) represents the probability of making a Type I error — rejecting a true null hypothesis (false positive).
What does a significance level of α = 0.01 mean?
A significance level of α = 0.01 means there’s a 1% chance of rejecting the null hypothesis when it’s actually true (Type I error).
What is a Type I error?
A Type I error occurs when we reject a true null hypothesis (a false positive).
What is a Type II error?
A Type II error occurs when we fail to reject a false null hypothesis (a false negative).
What is the relationship between Type I and Type II errors?
There is an inverse relationship between Type I and Type II errors:
- Reducing the risk of a Type I error (α) increases the risk of a Type II error (β), and vice versa.
- Balancing the two errors often depends on the study’s priorities (e.g., detecting a rare disease vs. avoiding false positives).
What is the alternative hypothesis?
The alternative hypothesis (H₁ or Ha) is a statement that contradicts the null hypothesis and suggests that there is an effect or a difference.
What is the relationship between the null and alternative hypothesis?
The null hypothesis (H₀) and the alternative hypothesis (H₁ or Ha) are mutually exclusive:
- If H₀ is rejected, H₁ is supported.
- If H₀ is not rejected, we cannot conclude that H₁ is true (we simply do not have enough evidence).
What is the correct interpretation of statistical significance?
Statistical significance means that the observed effect or relationship in your data is unlikely to have occurred by random chance.
- Correct interpretation: “The data provides enough evidence to reject the null hypothesis at the chosen significance level (e.g., α = 0.05).”
- It does not mean: That the effect is practically important or that the null hypothesis is proven false.
What is a common fallacy when interpreting statistical significance?
“A statistically significant result means the null hypothesis is false.”
- A statistically significant result means there’s evidence against the null hypothesis, but it doesn’t prove the null hypothesis is false.
What is a correlation, and what are its key characteristics?
Correlation measures the strength and direction of a linear relationship between two variables.
- Range: -1 to +1.
- +1 = perfect positive correlation (both variables increase together).
- -1 = perfect negative correlation (one variable increases as the other decreases).
- 0 = no correlation (no linear relationship).
- Strength: The closer the absolute value of the correlation is to 1, the stronger the relationship.
How do you interpret a positive correlation?
Positive correlation (r > 0): As one variable increases, the other also increases.
How do you interpret a negative correlation?
Negative correlation (r < 0): As one variable increases, the other decreases.
What are the limitations of correlation, and what should be considered in interpretation?
- Correlation does not imply causation: A correlation between two variables does not mean that one variable causes the other.
- Confounding variables: A third variable might be influencing both correlated variables.
- Linear relationship: Correlation only captures linear relationships and may miss non-linear connections.
What is “inference by eye” and how is it used with independent groups?
Inference by eye refers to visually inspecting error bars (like 95% confidence intervals) on graphs to make judgments about statistical significance.
- If the 95% confidence intervals do not overlap, there’s likely a statistically significant difference.
- If they do overlap, a difference is not necessarily significant — overlapping bars can still produce a significant result depending on the extent of the overlap.
How can “inference by eye” be used when comparing means?
When comparing sample means visually:
- Look at the distance between means relative to the length of the error bars.
- If the means are far apart and their error bars don’t overlap, this suggests a statistically significant difference.
- Rule of thumb: If the gap between means is at least as wide as the average margin of error, the difference may be significant.
How does “inference by eye” apply to correlations?
- In scatterplots, stronger linear patterns (tight, straight lines of points) suggest stronger correlations.
- A loose, cloud-like pattern indicates weak or no correlation.
- Error bars or confidence bands around a regression line can help judge if a correlation is statistically meaningful: narrow bands and clear slope = stronger inference.