Sampling theory Flashcards
What are 3 tasks in scientific research for which statistics is useful?
- Estimating parameters: population parameters or model fitting/system identification where the parameters of the model are estimated from the results of a series of experiments.
- Experimental design: how to minimize mesurment error due to bias and inaccuracy, Comparative experiments: how to design experiments to mea-
sure the comparative performance of different individuals, Factorial experiments: the design of experiments where the vari-
able of interest depends on several different factors. - Quality control: Acceptance sampling: monitoring the quality of items by testing
small samples. Process control simple view of keeping a continous processat a specified level.
What is sampling theory? and what is it useful for?
Decribes the relationship between a sample and the population that the sample represents.
Samples are used to estimate unknown population parameters from info contained only in the sample data with suitable statistics.
Useful for comparing two populations
by comparing samples from those populations. This is done using hypothesis
testing, where one asks if the differences between two samples are likely to
represent differences in the underlying populations
Sampling theory charachteristic?
If we take a sample from a population with n random variables X1, X2 … Xn that has a joint density/mass probability function, it is one of the different possible ones which has a certain probability of occurring.
Calculating this probability is too complicated if we don’t assume that the sampling is independent.
When is sampling considered indipendent?
Sampling is said to be independent if the random variables X1, X2, . . . , Xn are independent and identically distributed (i.i.d.)
What are the three types of random sampling?
- Sampling from an infinite population (the population remains the same after any individual is drawn from it), in this case, the random variables representing the sample are i.i.d.
- Sampling with replacement from a finite population (the population remains the same after any individual is drawn from it) in this case the random variables representing the sample are i.i.d.
- Sampling without replacement from a finite population, the population changes therefore the random variables are not independent. Remember formula.
When can a finite and without replacement population be modeled finite?
When it is large enough (typically at least a few hundred, and preferably in the thousands) sampling without replacement can be modeled
to a very good approximation by independent random variables
When the data obtained from sampling could be analyzed using time series analysis but not by the analytical methods covered in this course?
When between data there is sequential correlation of values sampled at different time intervals.
The sample is modeled as a collection of random variables with certain parameters e.g. sample mean. All the different samples from the population can be modeled with a random variable x̄ which has a probability distribution itself called the sampling distribution. This model is appropriate since
we expect the sample mean to depend on the data values in our given sample.
Do we know the parameters of the underlying probability model we are sampling from?
No, that is why we want to estimate population parameters from sampling distributions and therefore statistics.
What is a sample statistic S?
Is a function of the random variable of the sample Xi calculated from a sample of data taken from a larger population that has its own distribution
These statistics provide information about the characteristics of the sample, which can then be used to make inferences or draw conclusions about the population as a whole. They are an estimate for a population parameter p which is usually the mean, variance, and proportion.
What is an unbiased (representative) estimator? When is the sample statistic S considered an unbiased estimate? how do you calculate the bias?
An estimate that on average gives the correct value.
It is an unbiased estimate of p if its expected value E(S) is equal
to p.
The bias of S = E(S) - p.
The bias has to be low because we want our estimator to have small variance so that the probability of estimating p incorrectly is small.
What is an empirical average? why we calculate it?
Is the average or mean value calculated directly from observed data in a sample. It is the actual arithmetic average of the values in the sample.
Because we cannot calculate E(S) directly from a single sample because it requires knowledge of the underlying population parameters, which are usually unknown.
What is the sample mean, the variance of the sample mean and the standard deviation of the sample mean?
- x̄ is the sample mean, x̄ = 1/n * ∑ i Xi, and is an unbiased estimator of the population mean since E(x̄) = E(x)=µ.
- The standard deviation of the sample mean is σ/√n. (The factor n in the denominator implies that the precision of our estimate is greater with a larger sample, but to have a twice more accurate estimate we need four times more data)
- The variance of the sample mean x̄ is Var(x̄) = n Var(X)/n^2 = σ^2/n
When do you have an MVUE (Minimal Variance Unbiased Estimator)?
When the population has a normal distribution then the sample mean is the MVUE of the population mean.
What does the central Limit Theorem state? what are the 2 rule of thumb?
That when the sample is large enough then the sampling distribution tends to follow a normal distribution with mean µ and variance Var(µ) = σ^2/n.
x̄∼N(μ, σ/√n)
B(n,p) ∼ N(np,√np-(1-p) ) for np>5, n(1-p)> 5, n>30
P(λ) ∼ N(λ, √λ) for λ>5