Sampling and Estimation Flashcards
Simple random sampling
Process of selecting sample where each member of population has equal chance of being selected
Sampling distribution
Distribution of all distinct possible values that statistic can assume when computed from samples of same size drawn randomly from same population
Simple random vs. stratified random sampling
Stratified involves dividing population into subpopulations based on certain criteria, then using simple random sampling in each stratum. Allows making sure that certain populations are represented in sample.
Time series vs. cross sectional data
Sequence of information over intervals of time vs. data on some characteristic at set point in time
Central limit theorem and its importance
Given population with mean μ and finite variance σ^2, sampling distribution of sample mean computed from samples of n size will be approximately normal with mean μ and variance of σ^2/n when sample size is greater than 30.
Allows making precise probability statements about POPULATION mean by using sample mean, regardless of distribution of population
Calculate and interpret standard error of sample mean
standard deviation (pop or samp) / square root of n
It is the standard deviation of sampling distribution of sample mean.
Desirable properties of estimator
Unbiased - expected value equals parameter intended to estimate
Efficient - no other unbiased estimator of same parameter has smaller variance of sampling distribution
Consistent - probability of estimates close to value of population parameter increases as sample size increases
Point estimate vs. confidence interval estimate of population parameter
Point estimate - single number used to estimate parameter
Confidence interval estimate - range of values that brackets population parameter with probability 1 - α (degree of confidence) that it will contain the parameter [100(1-α)%]
Describe properties of Student’s t-distribution
Symmetrical probability distribution defined by single parameter - degree of freedom
Can use to construct confidence intervals for population mean when population variance is UNKNOWN
calculate and interpret degrees of freedom for t-distribution
n-1
Number of degrees of freedom in estimating population variance
Calculate and interpret confidence interval for population mean with normal distribution and known population variance
sample mean +- z (sub α/2) * (σ/square root of n)
Calculate and interpret confidence interval for population mean with normal distribution and unknown population variance
Only use if sample large or population normally distributed
sample mean +- t (sub α/2) * (s/square root of n)
Calculate and interpret confidence interval for population mean with normal distribution and unknown variance and large sample size
Only use if sample large or population normally distributed
sample mean +- t (sub α/2) * (s/square root of n)
Appropriate sample size
Look at need for precision, risk of sampling from more than one population, expenses of different sample sizes
Data mining bias
Errors arising from misuse of data. Drilling until finding something that works. Frequently these will fail in future because they are after-the-fact.
Watch for:
Too much digging, too little confidence
No story, no future