(R10) Sampling and Estimation Flashcards
Define Simple Random Sampling and provide two methods
Each element has an equal probability of being chosen; 1) random number generate or 2) select every kth element
Sampling Distribution
The distribution of all distinct possible values that a statistic can assume when computed from samples of the same size randomly drawn from the same population
Sampling error
The difference between the observed value of a statistic and the quantity it is intended to estimate (Sample mean - population mean)
Stratified Random Sampling
- Separate the population into smaller groups based on one more distinguishing characteristics; then use simple random sampling
- provides more precise mean and variance
Three Data Types
- Time Series
- Cross-Sectional
- Panel
Time Series Data
Take a variable or multiple variables and observe how the variables change over a period of time
i.e. Monthly returns on Microsoft stock from Jan 1994 to Dec 2004.
Cross-Sectional Data
Multiple observational units at a point in time
i.e Sales for 30 different companies for a particular quarter
Longitudinal Data
Observations over time of multiple characteristics of the same entity, such as unemployment, inflation anf GDP growth rates, for a country over 10 years.
Panel Data
Time series + cross sectional. Data that contains observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over 24 quarters.
Standard Error Formula
Standard deviation divided by square root of n; the standard deviation of the distribution of the sample means.
Central Limit Theorem
Theorem that states for simple random samples of size n, from a population with a mean u, and a finite variance, sigma^2, the sampling distribution of the sample mean, Xbar, approaches a normal probability distribution with mean u, and a variance equal to sigma^2 / N as the sample size becomes large.
Properties of CLT
- If the sample size, n, is sufficiently large (n>= 30), the sampling distribution of the sample means will be approximately normal.
- The mean of the population, u, and the mean of the distribution of all possible sample means are equal.
- **The variance of the distribution of sample means is sigma^2 /N. the population variance divided by the sample size.
Desired Properties of an Estimator
- Unbiasedness - when the expected value of the estimator is equal to the parameter you are trying to estimate.
- Efficient – if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
- Consistent - the accuracy of the parameter estimate increases as the sample size increases.
Point Estimates
Sample mean and sample variance are point estimates
Confidence Interval Formula
Point estimate +/- reliability factor * standard error
C.I. = Xbar + z * (sigma / n^(1/2))
Distribution with known variance, which table should be used to create confidence interval?
Use Z score
Distribution with unknown variance, which table should be used to create confidence interval?
Use t score if sample is less than 30; use t or z score if sample is greater than 30
Level of Significance
How confident your estimate is, denoted by alpha
Characteristics of T-Distribution
- Centered at Zero
- Flatter than a normal distribution
- As df increases, shape becomes more spiked and tails become thinner.
- t-test levels of significance only correspond to one tail probabilities
Confidence intervals are affected by:
- z score or t score
- alpha - level of confidence
- n - number of samples
Data mining bias
Bias that refers to results where the statistical significance of the pattern is overestimated because the results were found through data-mining (the practice of hitting a data set over and over again until you hit gold)
Sample selection bias
Bias which occurs when some data is systematically excluded from the analysis, usually because of the lack of availability (survivorship bias in mutual funds)
Look ahead basis
Occurs when a study tests a relationship using sample data that was not available on the test date (i.e. stock price/returns vs. accounting data)
Time period basis
Results only apply for that specific time period
Unbiased estimator
When the expected value of the estimator is equal to the parameter you are trying to estimate.
Efficient Estimator
If the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
Consistent Estimator
The accuracy of the parameter estimate increases as the sample size increases.