Week 5 Flashcards
Point estimate
- Sample mean is a point estimate
- Represents a very precise statement
- Not sure how accurate it is – due to sampling error
Confidence intervals
- sample means vary in a predictable way, we can estimate the likelihood of the population mean being within a certain range
- to work this out, we need to have an idea of
- > the centre of the distribution
- > the population mean
- > the spread of the distribution (which for confidence intervals is the standard error)
Standard error
the standard deviation divided by the square root of the number of observations.
Representative samples
- In a normally distributed population 95% of scores are within 2 SD of the mean
- In a sampling distribution of the mean 95% of sample means are within 2 standard errors of the population mean
- 68 / 95 / 99.7
- must be representative
- must be at least 30 datums
Single sample logic
- what is the range of likely values of the population mean
- 95% of samples are within 2 standard errors of the population mean
- therefore 95% chance that my sample mean is within 2 standard errors of the actual population mean
- This means, there is a 95% probability that the population mean is between two points
- > x̄ -2σ(x̄) & x̄ +2σ(x̄)
- Because of all of the potential samples means we could get from a population [] - []95% will be in this range, and we call this the 95% confidence interval.
95% CI
- The 95% Confidence Interval (CI95) is a range of scores, centred on a sample mean, within which the population mean occurs 95 times out of 100.
- On average the 95% CI does not include the population mean 5 times out of 100
CI formulas
95% Confidence Interval = CI95 = x̄ ± 2σ(x̄)
• By the same logic:
• 68% Confidence Interval = CI68 = x̄ ± 1σ(x̄)
• 99.7% Confidence Interval = CI99.7 = x̄ ± 3σ(x̄)
• CI(p) = x̄ ± zσ(x̄)
• Where:
• p = probability you will include the population mean
• z = “z critical” = z score that borders the middle p % of scores in the standard normal distribution
Margin of error
z score multiplied by standard error
Upper bound
- mean plus the margin of error
- the highest value of the range
Lower bound
- mean minus the margin of error
- the lowest value of the range
Reporting CIs
- mean +/- the margin of error
- lower boundary and upper boundary
CI influence on z scores
- As Confidence Level increases precision of estimate decreases (interval gets wider)
- As C increases value of z* increases
- > probability of accuracy increases as range increases
CI influence on standard deviation
- As variation in the population goes down precision increases (interval gets narrower)
- the more similar people are, the better we can predict the range of the mean
CI influence on n
- As sample size increases precision increases (interval gets narrower)
- more people allows more accurate intervals
T scores
- If you have an infinite number of scores - the t-distribution is the z-distribution
- as you change sample size, as you decrease your degrees of freedom the distribution get’s flatter and wider. - to calculate t, you need to know what your degrees of freedom are.
- a t-value of 2.5 SD, how extreme that value is in the t-distribution depends on the degrees of freedom. A value of 2.5 is quite extreme with an n of 60, there is only a small area under the curve to the right of the value. But with an n of 5, we can be less confident of capturing the mean - so our value of 2.5 is a less extreme estimate as there’s a larger area left over under the curve.
When SD is unknown for CI calc
CI95 = x̄ ± t*(s/√n)
- where s is sample standard deviation and t is our t score for our CI
Value of CI
- Sample mean is a precise point estimate of uncertain accuracy
- Confidence intervals are a less precise interval estimate of definable accuracy
- Width of CIs indicate variability & generalisability
- Confidence intervals allow comparisons across studies
- If confidence intervals substantially overlap supports goal that both studies sample same population
- If CIs don’t overlap, unlikely they have sampled the same population.
Distinct populations
• Distinguishing populations
-> Defined by some characteristic
-> Can measure behaviour and show they are different
• Any group that behaves differently along any dimension could be called a distinct population
• Example:
- Extroverts vs Introverts
- Males vs Females
Hypotheses
• The null hypothesis (H0)
-> that there is no relationship between the two variables that we are
investigating
-> The alternate hypothesis (HA)
• that there is a relationship between the two variables
Hypothesis testing
- Step 1: State Hypotheses (HO & HA)
- Step 2: Calculate an appropriate test statistic
- Step 3: Determine the probability of HO
- Step 4: Evaluate the probability of HO & state your conclusion
Step 1, state hypothesis
• State both HO & HA
• HA can be one-tailed or two-tailed
-> One tailed predicts direction of effect
e.g., The climate is getting warmer
-> Two tailed just predicts an effect, doesn’t predict the direction of the effect
e.g., The climate is changing
Step 2, calculate the test statistics
- to calculate the test statistic, we need to work in z-scores
- Z-scores indicate how far a score is from the mean in terms of standard deviation units
- The z-test assesses how far any individual sample mean is from the population mean in terms of standard errors.
- So you can get an obtained z-score by dividing the difference between the sample mean and the population mean, by the standard error of the mean.
Step 3, determine probability of null hypothesis (two tailed)
- if we set our alpha value as 0.05, 95% confidence, in a 2 tailed test, our z-critical value is going to be plus or minus 1.96 (area divided among the two tails rather than all at end on one)
- Anything above or below that value is going to count as significant
Step 3, determine probability of null hypothesis (one tailed)
- If we have a one tailed test, our z-critical value is 1.64
- but we have to predict the direction, so we will either have a z-critical score of being greater than 1.64, or less than minus 1.64 to count as significant
- If the obtained z-score for your distribution is greater than the z-critical score (in the positive or negative direction) is reported as p (the probability) is less than 0.05