Random Sampling Flashcards
What is a sampling distribution?
Consider selecting TWO DIFFERENT samples of size n from a POPULATION distribution.
The xis in the second sample will virtually ALWAYS differ at least a bit from those in the first sample.
Because of this UNCERTAINTY, before the data becomes available we view each obs as a RANDOM VARIABLE and denote the sample by X1, X2,…,Xn.
This variation in observed values in turn implies that the value of any function of the sample observations, e.g. the SAMPLE mean, sample std, etc also VARIES from SAMPLE to SAMPLE.
i.e. there is UNCERTAINTY as to the value of xbar, s, etc.
The value of the SAMPLE mean from any PARTICULAR SAMPLE can be regarded as a POINT ESTIMATE (point because it is a single number corresponding to a single point on the number line) of the POPULATION mean mu.
What is a statistic?
A STATISTIC is any quantity whose value can be calculated from the SAMPLE data.
PRIOR to obtaining data, there is UNCERTAINTY as to what value of any particular statistic will result. Therefore, a STATISTIC is a RANDOM VARIABLE and will be denoted as an UPPER CASE letter; a lower case letter is used to represent the CALCULATED or observed value of the statistic.
Thus, the SAMPLE mean xbar is a statistic, as is the standard error s.
Any statistic, BEING a RV, has a PROBABILITY DISTRIBUTION.
The probability DISTRIBUTION of a statistic is referred to as its SAMPLING DISTRIBUTION to emphasize that it describes how the statistic VARIES in value ACROSS ALL SAMPLES selected.
What is a random sample?
The rvs X1, X2,..,Xn are said to form a (simple) RANDOM SAMPLE of size n if
- the Xis are INDEPENDENT rvs
- Every Xi has the SAME probability DISTRIBUTION
1,2 can be summarized as i.i.d.
Explain how to simulate a statistic’s sampling distribution.
A simulation can be done with the following specs:
- The statistic of interest is specified.
- The population distribution must be specified
- the sample size n is specified
- The number of replications k
Use a computer to obtain k different RANDOM SAMPLES, ea of size n, from the designated population distribution.
For ea. sample, calculate the value of the STATISTIC and construct a HISTOGRAM of the k calculated values.
The histogram gives the APPROXIMATE sampling DISTRIBUTION of the statistic. The larger the value of k, the better the approximation will tend to be. The actual distribution emerges as k –> inf.
What is the distribution of the sample mean?
Let X1, X2,…,Xn be a RANDOM SAMPLE from a distribution with mean value mu and std sigma. Then
- E(Xbar) = mu_Xbar = mu
- V(Xbar) = sigma^2 _Xbar = sigma^2 /n
sigma_Xbar = sigma /sqrt(n)
In addition, with T0 = X1+…+Xn (the sample total),
E(T0) = n*mu, V(T0) = n*sigma^2, sigma(T0) = sqrt(n) * sigma
What is the Central Limit Theorem?
Even when the POPULATION distribution is highly NONNORMAL, averaging produces a distribution more BELL-SHAPED than the one being sampled.
A reasonable conjecture is that in n is LARGE, a suitable NORMAL curve will APPROXIMATE the distribution on Xbar.
CLT Theorem:
Let X1,X2,…,Xn be a random SAMPLE from a dist with mean mu and var sigma^2. Then if n is sufficiently LARGE, Xbar has APPROXIMATELY a NORMAL dist with mu_Xbar = mu, sigma^2 _Xbar = sigma^2 /n and sample total T0 also has approximately a normal dist with mu_T0 = nmu, sigma^2 _T0 = nsigma^2. The LARGER the n, the BETTER the approximation.
The CLT is applicable whether the variable of interest is discrete OR continuous.
CLT Rule of Thumb:
If n>30, the CLT can be used.
The amount of impurity in a batch is a rv with
mean 4.0
std 1.5.
If 50 batches are independently prepared, what is the probability that the sample average amount of impurity Xbar is between 3.5 and 3.8?
Since n=50>30, the CLT is applicable s.t. Xbar approximately follows a normal distribution with:
mean value mu_Xbar = 4.0
sigma_Xbar = sigma /sqrt(n) = 1.5/sqrt(50) = .2121
P(3.5 <= Xbar <= 3.8) = P( (3.5-Xbar)/.2121 <= Z <= (3.8-Xbar)/.2121 ) = P( (3.5-4.0)/.2121 <= Z <= (3.8-4.0)/.2121 ) = P( (-.5/.2121) <= Z <= (-.2/.2121) ) = P( -2.36 <= Z <= -.94 ) recall phi(z) = P(Z<=z) = phi(-.94) - phi(-2.36) = .1736 - .0091 = .1645
What is standard normal variable?
If actual sample observations x1,x2,…,xn are assumed to be the result of a RANDOM SAMPLE X1,…,Xn from a NORMAL DISTRIBUTION with mean value mu and std dev sigma, then irrespective of the size of n, the SAMPLE MEAN Xbar is normally distributed with expected value mu and standard dev sigma/sqrt(n).
Standardizing Xbar yields the STANDARD NORMAL VARIABLE
Z = (Xbar-mu) / (sigma /sqrt(n))