Quant: Sampling and Estimation Flashcards
Simple Random Sampling =
Randomly choosing items from a population
Systematic Sampling =
Drawing every nth member of the population
Sampling Error =
Difference between a sample statistic and a population parameter
Sampling Distribution =
Distribution of the statistics drawn from the samples - if we repeat the sampling process and come up with a number of different sample means, thos sample means will themselves have a distribution.
Stratified Random Sampling =
The population is split into groups (stratum) and samples are taken from those groups (based on their relative weighting within the entire population)
AS OPPOSED TO SIMPLE RANDOM SAMPLING
Cross sectional/time series/panel data/ longitudinal data =
Time series is over a period of time at equal intervals.
Cross sectional is at a point in time
Longitudinal data is multiple data points over time for the same entity.
Panel data is a specific data point over time for different entities.
Central Limit Theorem =
The central limit theorem states that for simple random samples of size: n from a population with a mean µ and a finite variance σ2, the sampling distribution of the
sample mean x approaches a normal probability distribution with mean µ and a variance equal to σ2/n as the sample size becomes large.
Central Limit Theorem (2) Important Points =
We can make inferences about the population mean from the sample mean, regardless of the population’s distribution, if sample size n> (or equal to) 30.
ie N>30 MEANS SAMPLING DISTRIBUTION WILL BE APPROXIMATELY NORMAL which means we can do hypothesis testing and construct confidence intervals.
The mean of the population, µ, is equal to the mean of the distribution of all possible sample means.
Variance of the distribution of sample means is σ2/n.
Standard Error (of the sample mean) =
standard deviation of the distribution of the sample means.
When the SD of the population is known (see formula)
However population SD is not normally known, in which case we use s instead of σ.
AS σ OR s INCREASES, STANDARD ERROR INCREASES.
AS n, SAMPLE SIZE, INCREASES, STANDARD ERROR DECREASES.
Desirable properties of an estimator =
Unbiasedness: expected value of the estimator is equal to the parameter you’re trying to estimate.
Efficient: variance of the sampling distribution is the least out of all unbiased estimators (ie has the lowest sampling error)
Consistent: accuracy of the parameter estimate increases as sample size increases (as n, sample size increases, the standard error decreases and the sampling distribution bunches around the population mean)
Point estimate vs confidence interval estimate =
A point estimate is a single value used to estimate a parameter. The sample mean is a point estimate for the population mean.
A confidence interval is a range of values in which the paramater is expected to lie.
When to use a t-distribution =
SMALL SAMPLE, <30, from populations with unknown variance and a normal/approx normal distribution.
May also be appropriate when variance is unknown and sample size is large enough that the central limit theorem will assure that the sampling distribution is approximately normal.
t-distribution characteristics =
Symmetrical.
Defined by a single parameter, degrees of freedom, where the degrees of freedom are equal to the number of sample observations minus 1 for sample means.
Fatter tails than a normal distribution.
As the df gets larger the shape becomes closer to a normal distribution.
THE SHAPE CHANGES AS YOU HAVE MORE OBSERVATIONS AND DF CHANGES
Degrees of confidence/level of significance =
Degree of confidence = 1 - ALPHA, where alpha is the level of significance
Confidence interval (normally distributed and has a known variance) =
point estimate +- (reliability factor x standard error)
For a population that is normally distributed and has a known variance: