WEEK 4- DISTRIBUTION OF SAMPLES: Flashcards
Sampling distribution
Sampling distribution is the distribution of a summary statistic.
Medical research often involves acquiring data from a sample of individuals and using the information gathered from the sample to make inferences about a broader group of individuals.
Steps for sampling distribution:
Take samples of various sample sizes.
1. Repeated independent samples of the same size from the
same “population”.
2. Sample mean calculated for each sample.
3. Graph the results (sample means) on a Histogram.
This histogram is called the “sampling distribution”.
Which of the following statements is INCORRECT regarding the mean of a sampling distribution?
It is the mean of the statistic for all of the samples in the distribution.
It depends on the sample size.
It is the same as the population parameter.
It depends on the sample size.
“CENTRAL LIMIT THEOREM”
“For a large sample size, the distribution of the sample mean is normally distributed, even when the population distribution from which the sample has been drawn is decidedly non‐normal, with mean equal to the true mean of the sampled population and standard deviation equal to the standard error of the sample mean.”
Implication of central limit theory
Solution: Take one sample.
Calculate descriptive statistics (e.g. mean, standard deviation).
Use the sample statistics (e.g. mean) of this sample to make
inferences about the population.
i.e. We use the sample statistic (e.g. mean) to guess/infer what
the population statistic is.
What is the “Standard Error”?
It is a measure of precision of the sample mean from a single sample in estimating the population mean. The smaller the SE, the more precisely the population mean is being estimated.
Dispersion (spread) of the sampling distribution.
Measures how precise the population mean is
estimated by the sample mean.
Technically: Standard Deviation of the distribution of
the sample mean.
As sample size increases, SE …..
decreases
when z-score and T score is used?
Z‐score
Used when the Population (true) Standard Deviation is known
T‐score
Used when the Population (true) Standard Deviati
Suppose that the blood cholesterol level of all men aged 20 to 30 is bell shaped with mean 186 mg/dl and an unknown standard deviation.
In a simple random sample of 100 men from this population the sample standard deviation is 41 mg/dl.
What is the value for the spread for the sampling distribution?
4.1
T distribution
The t distribution is similar to the normal distribution and is appropriate for continuous data; both are symmetric and bell shaped.
Tails are a bit longer for the t distribution compared to the normal distribution.
The shape of the t distribution depends on the sample size. For large samples, the t distribution is more like the normal distribution
As the sample size (n) increases the t‐distribution approaches the normal distribution..
Degrees of freedom (df)
The t-distribution is associated with calculation of degrees of freedom (df).
It denotes the number of independent pieces of information available to estimate another piece of information.
df = Number of pieces of information that was used to estimate
another
For a large sample size, the distribution of the sample mean is …… distributed, even when the population distribution from which the sample has been drawn is decidedly non-normal, with ……. equal to the true mean of the sampled population and standard deviation equal to the standard ……. of the sample mean.
normally, mean, error
According to the normal distribution probability law:
Approximately 68% of the sample means are expected to lie within one standard error of the true mean, that is, within +1SE and -1SE.
Approximately 95% of the sample means are expected to lie within 1.96 or approximately two standard errors of the true mean, that is, within -2SE and +2SE.
Approximately 99.7% of sample means are expected to lie within 2.97 or approximately three standard errors of the true mean, that is, within -3SE and +3SE.
Which of the following statements is INCORRECT regarding the t-distribution?
It is appropriate for continuous data
A calculation of degrees of freedom is associated with it
When the sample size is small, it is more like the normal distribution
In general, it is less “peaked” than the normal distribution
When the sample size is small, it is more like the normal distribution
Area under the sampling distribution: True SD Unknown
When population SD is unknown the sampling distribution follows a t‐distribution, with df = n – 1, where n = sample size.
As the sample size increases the distribution of t-score approaches the standard normal distribution.
1- Calculate SE = SD sample /square root of n
2- Calculate df= n-1
3- calculate the t- score = sample mean -true mean/ SE