Reading 11: Sampling and Estimation Flashcards
Simple Random Sampling
Method of selecting a sample where each variable has the same likelihood of being included e.g. drawing names out of a hat
Sampling Error
Sampling Error = sample mean - population mean
Sampling Distribution
Distribution of all values that a sample statistic can take on when computed from samples of identical size randomly drawn from the same population
Stratified Random Sampling uses a…
classification system to separate the population into smaller groups based on one or more distinguishing characteristics
Time-Series Data
Observations taken over a period of time at specific and equally spaced intervals
Cross Sectional Data
Sample of observations taken at a single point in time
Longitudinal Data
Observations over time of multiple characteristics of the same entity (think country)
Panel Data
Observations over time of the same characteristics over multiple entities
Central Limit Theorem (Definition)
The larger the same size, (>30) the closer the sample gets to normal distribution. The means of the population and sample will be equal
Central Limit Theorem (Variance Formula)
SD^2 / n
n = sample size
Standard error of the sample mean (definition)
Standard deviation around the population mean
Standard error of the sample mean (If population Sd known) (formula)
Standard Error = SD of population / Square Root of n
n = size of sample
Standard error of the sample mean (if population Sd unknown) (formula)
S / Square Root of N
Where:
S = SD of the sample
Desirable Properties of an Estimator
- Unbiasedness
- Efficiency
- Consistency
Unbaisedness
EV of estimator = EV of parameter
Efficiency
Lower sampling error than any other unbiased estimator
Consistency
Variance of sampling error will decrease with sample size e.g. it will become more accurate
Point Estimate
Single sample values used to estimate population parameters (mean)
Confidence Interval
Range of values in which the population parameter is expected to lie
Confidence interval (Formula)
Point Estimate + / - (reliability factor x standard error)
Student’s t distribution
Distribution to use when constructing confidence intervals based on:
- Small sample sizes (n < 30)
- unknown variance
- normal distribution
t - distribution (properties)
- Symmetrical
- Defined by degrees of freedom (n- 1) (df)
- Fatter tails
Degrees of Freedom (df) (Affect on distribution)
More df results in a greater percentage of observations near the center of the distribution
Confidence Intervals will be narrower
Confidence Interval Formula (Known Variance)
Point Estimate + (reliability factor x standard error)
Confidence Interval Formula (Unknown Variance)
Point Estimate + (t-reliability factor x standard error of sample mean)
Confidence Interval Formula (large sample unknown variable)
Point Estimate + (Z Statistic x standard error)
Limitations of large sample size
- Observations from different population/distribution
2. Cost (Must be weighed against increase in precision)
Data-Mining Limitation
Repeated use of same database to search for patterns
Data-Mining Bias
Overstating the statistical significance of a pattern because a the data was derived through data mining
Sample Selection Bias
Systematic exclusion of some data which renders the sample non-random
Survivorship Bias
Excluding badly performing funds
Look-ahead bias
Testing relationship when the data was not available at the test date
Time-period bias
Data gather is either too short or too long