SMCR Flashcards
- Sampling dist. –> central element in estimation and null hypothesis testing. It’s the gist. of the outcome scores of man samples –> used to make inferences about the population.
- Data collected in a random sample.
- Sample statistic –> characteristic we’re interested in.
- Sampling space –> range of values sample statistic can take.
- Units of analysis –> samples.
- Probabilities can either be referred to as a proportion (0-1) or percentages (0% - 100%).
- The mean of the sampling dist. is equal to the expected value of the sample statistic. The means of the sampling list. also equals the pop. proportion –> hence, the expected value also equals the pop. proportion ONLY IF the sample statistic is an UNBIASED ESTIMATOR.
Representative Sample:
- Sample is representative of the pop. if variables in the sample are dist. same way as in the pop.
- However, the random sample is likely tp differ due to CHANCE.
- Nevertheless, we expect it to be so we say it is in PRINCIPLE REPRESENTATION of the pop.
Continuous sample statistic - continuous probabilities:
- instead of looking at single values we look at range values.
- the curve is called a probability density function.
- Right hand prob. –> concerns right hand tail of the sampling dist.
- Left hand prob. –> considers the left hand tail of the sampling dist.
Means at 3 levels:
- the pop.
- the sample.
- the sampling dist.
How to create a sampling dist.?
- exact approaches.
- bootstrapping.
- theoretical approximations.
Bootstrapping:
- You draw an original sample from the pop. (large sample) and from the original sample you collect samples (bootstrap samples) with replacement (usually).
- 5000 bootstrap samples.
- if the proportion of the sample statistic in the initial sample equals the population proportion, then the bootstrapped sampling dist. will be very similar to the true sampling dist.
- any sample statistic can be bootstrapped, some sample statistics even must be bootstrapped in order to create a sampling dist. for them. However, SPSS doesnt bootstrap all sample statistics.
Limitations of bootstrapping:
- the smaller the initial sample size is, the greater the chance of having a sample without the sample statistic we are interested in; hence, the bootstrapped sampling dist. will be quite diff. from the true sampling dist.
- Solution: have a large initial sample.
Exact approaches:
- if you know or think you know the population proportion, we can exactly calculate the probability that a sample includes that sample statistic.
- only works with categorical or discrete variables.
- Combinations and outcomes tables are used.
- e.g. coin flips.
- exact approaches uses the binomial prob. formula to calculate probabilities.
- exact approaches also available for 2 categorical variables in a contingency table.
- Fisher’s exact test.
Theoretical Approximations to the sampling distribution:
- Most tests use that.
- if the curve fits the histogram of observed sample means, then the normal function is a good approximation of the sampling dist.
- Left and right tails used for significance (2.5% on each side).
- Width/ peakedness of the sampling dist. express the variation (SD) in it.
- Probability Density Function.
- Bell shape of ND –> symmetrical –> sampling dist. of sample mean should be symmetrical.
- -> hence, a ND is a reasonable model for the prob. dist. of sample means.
- Sampling is subject to chance - we may draw samples that don’t really cover the sampling dist. well.
Conditions for the use of TPD:
- rules of thumb (table in book).
- Larger sample –> the closer sample statistic is to pop. proportion –> peaked dist.
- Sample dist. more skewed/ less symmetrical when pop. prop. is near 0 or 1.
- Large sample is very important.
- Less imp. if true prop. is closer to 0.5.
- Rule of thumb for using the ND as the sampling dist. of a sample prop. :
true prop. * sample size = - product must be larger than 5.
- REMEBER: this rule of thumb uses 1 - the prob. IF the prob. is larger than 0.5. So for any prob. larger than 0.5, we subtract it from 1 and then multiply the resulting smaller prob. by the sample size and see whether the product is larger than 5.
- Sometimes we just need to assume (educated guess) that the assumptions are met when we decide on using a TPD when it comes to the dist. of the scores in the pop.
Independent Samples –> IS T test
Dependent Samples –> DS T test
- Special sampling dist. for dependent samples:
+ mean diff. as the sample statistic.
- if conditions for using a TPD are not met, use exact approach or bootsrapping.
- if spss doesn’t have a test for the sample statistic you’re interested in, use bootstrapping.
Point estimate: best estimate of the parameter only if sample statistic is an unbiased est. –> pop. value would be equal to mean of sampling dist. and expected value of sample.
- Not reaaaaally accurate, due to chance in random samples.
- Hence, it’s better to estimate a range within which the pop. value falls –> interval estimate.
Interval estimate:
- selects sample statistic values closest to the average of the sampling dist.
- popular prop. is 95% CI –> and we want to know the boundary values for it.
- width of the estimated interval represents the precision of our estimate.
- The higher the confidence , the lower the precision and vise versa.
How to increase precision?
- decreasing the Confidence level doesn’t really do you nay good so best thing to do is increase sample size.
- Provides more information about the sample that would actually probably resemble it AND also yields lower standard error.
- Large samples –> more peaked distribution cus values are closer to the centre and more concentrate towards the pop. value.
- Concentration of sample statistic values is expressed by SD of the sampling dist. –> tells us how precise our estimate is.
- SD of sampling dist. is our standard error.
- When using point estimate approach, your sample mean in the point estimate, and the distance between the point estimate and the pop. proportion is the SE.
- When using more than one sample mean, the distance between those and the pop. proportion expresses the size of the error we would make if we generalize the sample mean to the pop.
- More variation –> larger SE.
- However, we can’t control variation in the scores.
- Critical values are the values on the intervals.
- if we know the intervals, we can know the values.
- Standardization of the sampling dist. formula = (sample mean - average of sampling dist.) / SE
- Now the sampling dist. consists of standardized scores.
- The means is always 0 (Z dist.)
- The critical values in this dist. are -1.96 and 1.96.
(95% confidence)
How to calculate interval estimates from critical values and SE
UB = pop.value + (critical value* SE)
LB = pop. value - (critical value* SE)
- Binary decision - either accept or reject H0.
- H0 specifies one value for the pop. statistic.
- The sampling dist. is then centered around it.
- Test is sig. if the value falls in the tails (rejection region).
Rejection region:
- 2 tails - 2.5% each.
- Reject H0 if the sample mean falls there.
- Result is sig.
- Type 1 error - rejecting a true null hypothesis.
- The prob. of making this error is 5% (actually depends on the sig. level you choose for ur test)
- All those probabilities we can make are based around the idea that the null hypothesis is true.
- The p value of a test (THE ONE WE GET) , the location of the rejection regions and as a consequence, the sig. level of the test depend on the value of the pop. statistic that we specifics in H0.
- -> if the hypothesized pop. mean is moved toward to the sample mean –> p value becomes larger making it insignificant so we don’t reject HO.
- -> if hypothesized pop. mean is moved away from the sample mean –> p value becomes smaller making it significant so we reject H0.