Statistics Theory L9 = Inferences Using t-Distributions Flashcards
If we have a large sample & random sampling, what can we conclude about the sampling distribution? (3)
- Sampling distribution is centered on µ; where µ = mean of the population = mean of the sampling distribution.
- Spread of the sampling distribution = SD(Ŷ) = σ/√n .
- Shape of the sampling distribution is closer to normal than the population distribution.
Standard error (SE) of any statistic?
= the estimate of the SD of its sampling distribution.
Degrees of freedom (df) of the SE?
= the equivalent number of independent observations.
Df equation?
df = n -1
SE for sample average equation?
SE(Ŷ) = s/√n
What two ratios do we use based on the sample average?
- z-ratio (ƶ).
- t-ratio (t).
Equation for z-ratio (ƶ)?
ƶ = (estimate-hypothesised value) / (SD (Estimate))
*SD (Estimate) = σ/√n
Conditions to use the ƶ ratio? (2)
- We need to know the SD (Estimate) = σ/√n , the σ.
- We need to have a large sample (n ≈ 30).
NB about ƶ ratio? (2)
- If the sampling distribution of the estimate is normal, the sampling distribution of ƶ is normal (µ = 0, σ2 = 1), & this is a “standard normal”.
- Percentiles of N (0, 1) helps us to judge the certainty about parameter estimates & differences.
Equation for t-ratio (t)?
t = (estimate-hypothesised value) / (SE (Estimate))
*SE (Estimate) = sample SD = s/√n .
When can we use the t ratio?
When we have a small sample (n<30).
t ratio attributes? (2)
- Is wider than the standard normal (ƶ), because of the extra variability associated with estimating s.
- Has more degrees of freedom, means a better estimation & reduced variability.
NB about t ratio?
If Ŷ is an average from a random sample of size n from a normally-distributed population, the sampling distribution of the t-ratio is the Student’s t-distribution with n-1 degrees of freedom.
When is a one-sided/one-tailed test used? (2)
- When you expect one direction of change (increase only).
- When a difference in the opposite direction is insignificant.
Why is a one-sided/one-tailed test used? (2)
- It reduces the critical value, making it easier to reject the null hypothesis in the direction of interest.
- It increases your ability to detect an effect if it exists.
When is a two-sided/two-tailed test used? (3)
- When you just want to see if there is any significant difference, positive or negative.
- When you’re testing for inequality.
- When both extreme deviations are important in your analysis.
Why is a two-sided/two-tailed test used? (2)
- It ensures unbiased testing, as it considers both extremes.
- It avoids missing an important result in the opposite direction.
Scenarios where we use t-ratios to make an inference? (2)
- t-ratios for 1-sample inference & paired t-tests.
- t-ratios for 2-sample inference.
t-ratios for 1-sample inference & paired t-tests attributes? (2)
- Compares one sample average/an average difference in paired data to a hypothesized value.
- Response variable (y) is continuous & normally distributed.
t-ratios for 2-sample inference attributes? (3)
- Compares means from 2 independent samples.
- Response variable (y) is continuous & normally distributed.
- Predictor variable (x) is binary (categorical), group or population.
Eg for t-ratios for 1-sample inference & paired t-tests?
Schizophrenia example.
Schizophrenia example
Scientists identified a sample of identical twins where one twin had been diagnosed with schizophrenia & the other had not. The scientists used an imaging device to measure the volume (cm^3) of the left hippocampus?
Given an output, focus on:
- n (the 1st tibble).
- μ (AvDiff).
- s (SDDiff).
Is there evidence of an effect on volume of the hippocampus? What is the scope of inference? (8)
(i) Hypotheses
Ho: μ = 0 (1-tailed) and random sampling gave us purely by chance a representative sample (which could happen).
Ha: μ ≠ 0 (2-tailed).
(ii) From data/output:
t = (Ŷ - 0) / s/√n = (0.199 - 0)/ 0.238/√15 = 3.236.
(iii) df = n-1 = 15-1 = 14.
(iv) After getting t-statistic, illustrate it on graph to see if it arose by chance.
(v) Calculate p-value in R using:
1 - pt (3.236, 14) = 0.002988. Since it’s 2-tailed, p x 2 = 0.006.
- In test, Prof. Jason will give us the p-value.
(vi) We are still uncertain about this estimate (need to do CI afterwards).
(vii) Therefore, there is convincing evidence that schizophrenia is influenced by volume in the hippocampus (t = 3.236; df = 14; p = 0.006).
(viii) We cannot infer cause and effect as this was an observational study, and we cannot infer to the population beyond the sample as there was no random sampling, therefore the results only apply to the twins in this study.
Eg of t-ratios for 2-sample inference?
Finch example.
Finch example
These data are measurements of the beak depth of finches, the year before a drought (1976) and the year after a drought (1978), on the island of Daphne Major in the Galapagos.
Given output, focus on:
- n1; n2 (Count).
- μ1; μ2 (AvDepth).
- s1; s2 (SDDepth).
- sp (Pooled SD).
- SE (Ŷ2-Ŷ1) [SE of the average difference].
Is there a difference in the beak depth of finches between the years? Scope of inference? (10)
(i) Assume σ1 = σ2 = σ (common SD between the two groups).
(ii) Calculate pooled SD (sp):
sp = √[(n1 - 1)s1^2 + (n2 - 1)s2^2]/ (n1 + n2 - 2)
= √[(89 - 1)(1.04)^2 + (89 - 1)(0.906)^2]/ (89 + 89 - 2)
= 0.973.
(iii) df = n1 + n2 - 2 = 89+89-2 = 176.
(iv) SE (Ŷ2-Ŷ1) = sp √(1/n1) + (1/n2)
= 0.973 √(1/89) + (1/89)
= 0.1459.
(v) t = [(Ŷ2-Ŷ1) - (hypothesised value)] / SE (Ŷ2-Ŷ1)
= [(10.14 - 9.47) - 0] / 0.456
= 4.58.
(vi) Use t-statistics to illustrate on the graph (draw graph) & where p-values lie.
(vii) From R: 1 - pt (4.58, 176) = very very small.
(viii) Calculate CI’s (95% CI: 0.3807, 0.9564).
(ix) There is strong evidence of a difference between the beak depths of finches before and after the drought (t = 4.58; df = 176; p < 0.001). The estimated difference in beak depth between years was 0.6685 (95% CI: 0.3807, 0.9564).
(x) We cannot infer cause & effect as there was no random assignment however, we can infer to the larger population because they sampled from the whole population.