WEEK 4- DISTRIBUTION OF SAMPLES: Flashcards
Sampling distribution
Sampling distribution is the distribution of a summary statistic.
Medical research often involves acquiring data from a sample of individuals and using the information gathered from the sample to make inferences about a broader group of individuals.
Steps for sampling distribution:
Take samples of various sample sizes.
1. Repeated independent samples of the same size from the
same “population”.
2. Sample mean calculated for each sample.
3. Graph the results (sample means) on a Histogram.
This histogram is called the “sampling distribution”.
Which of the following statements is INCORRECT regarding the mean of a sampling distribution?
It is the mean of the statistic for all of the samples in the distribution.
It depends on the sample size.
It is the same as the population parameter.
It depends on the sample size.
“CENTRAL LIMIT THEOREM”
“For a large sample size, the distribution of the sample mean is normally distributed, even when the population distribution from which the sample has been drawn is decidedly non‐normal, with mean equal to the true mean of the sampled population and standard deviation equal to the standard error of the sample mean.”
Implication of central limit theory
Solution: Take one sample.
Calculate descriptive statistics (e.g. mean, standard deviation).
Use the sample statistics (e.g. mean) of this sample to make
inferences about the population.
i.e. We use the sample statistic (e.g. mean) to guess/infer what
the population statistic is.
What is the “Standard Error”?
It is a measure of precision of the sample mean from a single sample in estimating the population mean. The smaller the SE, the more precisely the population mean is being estimated.
Dispersion (spread) of the sampling distribution.
Measures how precise the population mean is
estimated by the sample mean.
Technically: Standard Deviation of the distribution of
the sample mean.
As sample size increases, SE …..
decreases
when z-score and T score is used?
Z‐score
Used when the Population (true) Standard Deviation is known
T‐score
Used when the Population (true) Standard Deviati
Suppose that the blood cholesterol level of all men aged 20 to 30 is bell shaped with mean 186 mg/dl and an unknown standard deviation.
In a simple random sample of 100 men from this population the sample standard deviation is 41 mg/dl.
What is the value for the spread for the sampling distribution?
4.1
T distribution
The t distribution is similar to the normal distribution and is appropriate for continuous data; both are symmetric and bell shaped.
Tails are a bit longer for the t distribution compared to the normal distribution.
The shape of the t distribution depends on the sample size. For large samples, the t distribution is more like the normal distribution
As the sample size (n) increases the t‐distribution approaches the normal distribution..
Degrees of freedom (df)
The t-distribution is associated with calculation of degrees of freedom (df).
It denotes the number of independent pieces of information available to estimate another piece of information.
df = Number of pieces of information that was used to estimate
another
For a large sample size, the distribution of the sample mean is …… distributed, even when the population distribution from which the sample has been drawn is decidedly non-normal, with ……. equal to the true mean of the sampled population and standard deviation equal to the standard ……. of the sample mean.
normally, mean, error
According to the normal distribution probability law:
Approximately 68% of the sample means are expected to lie within one standard error of the true mean, that is, within +1SE and -1SE.
Approximately 95% of the sample means are expected to lie within 1.96 or approximately two standard errors of the true mean, that is, within -2SE and +2SE.
Approximately 99.7% of sample means are expected to lie within 2.97 or approximately three standard errors of the true mean, that is, within -3SE and +3SE.
Which of the following statements is INCORRECT regarding the t-distribution?
It is appropriate for continuous data
A calculation of degrees of freedom is associated with it
When the sample size is small, it is more like the normal distribution
In general, it is less “peaked” than the normal distribution
When the sample size is small, it is more like the normal distribution
Area under the sampling distribution: True SD Unknown
When population SD is unknown the sampling distribution follows a t‐distribution, with df = n – 1, where n = sample size.
As the sample size increases the distribution of t-score approaches the standard normal distribution.
1- Calculate SE = SD sample /square root of n
2- Calculate df= n-1
3- calculate the t- score = sample mean -true mean/ SE
- Question
Suppose that the blood cholesterol level of all men aged 20 to 30 is bell shaped with mean 186 mg/dl and an unknown standard deviation.
In a simple random sample of 100 men from this population the sample standard deviation is 41 mg/dl.
What is the probability that the sample mean takes a value between 183 and 189 mg/dl?
Between 50% and 80%
Area under the sampling distribution: True SD known
Then irrespective of the sample size the area (other than reference ranges) under the sampling distribution for the sample mean can be calculated using the normal probability table.
This can be done by transforming the sample mean to a Z-score, where the Z-score is calculated by subtracting the true mean from the sample mean and dividing this difference by the standard error of the sample mean.
- Question
Let us consider that pre‐op creatinine level is heavily right skewed in the population.
If you draw a large sample from this population what should be the shape of this variable (Y) in the sample? Right skewed Negatively sewed Can not be determined Normally distributed
The shape of pre‐op creatinine level in the sample would be right skewed. This would occur because a sample is representative of the population, thus it will have the same distribution as the population.
- Question
Let us consider that pre‐op creatinine level is heavily right skewed in the population.
For large sample what will be the shape of the sampling distribution for the sample mean?
Normally distributed
The age group to which Anne belongs has mean height 1.6 metre and standard deviation 0.1 metre. The age group to which Devi belongs has mean height 1.2 metre and standard deviation 0.08 metre. Anne is 1.7 metre tall. Devi is 1.36 metre tall. Which is the taller for their age?
Devi
Juan makes a measurement in a chemistry laboratory and records the result in his lab report. The standard deviation of the students’ lab measurement is 10 milligrams. Juan repeats the measurement 4 times and records the mean of his 4 measurements.
What is the standard deviation of Juan’s mean result? 25 5 10 15
5
Standard deviation of Juan’s mean/average result is the standard error of the sample mean. The formula is: SE = SD/√n = 10/√4 = 5.
- Question
Juan makes a measurement in a chemistry laboratory and records the result in his lab report. The variance of the students’ lab measurements is 100 milligrams. Juan repeats the measurements 4 times and records the sample mean. How many times must Juan repeat the measurement to reduce the standard deviation of the sample mean to 2?
8
16
25
32
25
SD/√n = 10/√25 = 10/2 = 2, Juan needs to repeat the measurement 25 times.
Consider that the weight of a tumor in bladder cancer patients in the population follows the normal distribution with a mean 50g and standard deviation 5g.
If a bladder cancer patient is selected randomly what is the probability that the tumor is less than 45g? 15.87% 16% 34.13% 50%
15.87%
Consider that the weight of a tumor in bladder cancer patients in the population follows the normal distribution with a mean 50g and standard deviation 5g.
If 4 of these patients are selected at random, calculate the probability that the average weight of the 4 tumors (assume each patient has only one tumor) will be greater than 55g?
- 5
- 4772
- 0228
- 345
we use Z-score because SD is known but this time we will use sampling distribution for sample mean to calculate the Z-score. The formula is: Z-score = (Sample mean – True mean)/SE, where SE = SD/√n = 5/√4=5/2=2.5. Z-score = (55-50)/2.5 = 2.0. The area above mean of 55 is the same as the area above Z-score of 2.0. The area b/w 0 and 2.0 is 0.4772. Hence the tail area is: 0.5 – 0.4772 = 0.0228.
Consider the weight of adult Australians in a large sample follows a normal distribution with mean 78kg and standard deviation 10kg.
What is the median weight for the adult Australians?
78kg
Can not be determined
Mean and median have different values
None of the above
78 kg
Consider the weight of adult Australians in a large sample follows a normal distribution with mean 78kg and standard deviation 10kg.
Find the probability that a randomly selected adult Australian will have weight between 58 and 98 kg.
68%
95%
99.7%
50%
95%
Consider the weight of adult Australians in a large sample follows a normal distribution with mean 78kg and standard deviation 10kg.
Find the limits that include 99.7% of adult Australians.
50 to 100 kg
48 to 108 kg
25 to 50 kg
2 to 10 kg
48 to 108 kg