Sampling And Estimation Flashcards
A method of soliciting a sample in such a way that each item or person in the population being studied has the same likelihood of being in the sample
Simple Random Sampling
The difference between a sample statistic and its corresponding population parameter
Sample Error
A technique that uses a classification system to separate the population into smaller groups based on one or more distinguishing characteristics
Stratified Random Sampling
i.e. bond indexing
Data which consists of observations taken over a period of time at a specific and equally spaced time interval.
i.e. Time - Series – set of monthly returns on Microsoft stock from Jan 1994 to Dec 2004.
Time Series Data
Data that are a sample of observation taken at a single point in time.
i.e. Reported earnings per share of all NASDAQ companies as of Dec. 31, 2014.
Cross Sectional Data
Observations over time of multiple characteristics of the same entity, such as unemployment, inflation anf GDP growth rates, for a country over 10 years.
Longitudinal Data
Data that contains observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over 24 quarters.
Panel Data
Theorem that states for simple random samples of size n, from a population with a mean u, and a finite variance, sigma^2, the sampling distribution of the sample mean, Xbar, approaches a normal probability distribution with mean u, and a variance equal to sigma^2 / N as the sample size becomes large.
Central Limit Theorem
Properties of CLT
- If the sample size, n, is sufficiently large (n>= 30), the sampling distribution of the sample means will be approximately normal.
- The mean of the population, u, and the mean of the distribution of all possible sample means are equal.
- **The variance of the distribution of sample means is sigma^2 /N. the population variance divided by the sample size.
Sample Error of Standard Mean Calculation
sigmaXbar = sigma / n^(1/2)
* the standard deviation of the distribution of the sample means.
Desired Properties of an Estimator
- Unbiasedness - when the expected value of the estimator is equal to the parameter you are trying to estimate.
- Efficient – if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
- Consistent - the accuracy of the parameter estimate increases as the sample size increases.
Single (sample) values used to estimate a population parameter.
Point estimates
A bell-shaped probability distribution that is symmetrical about its mean.
t-distribution
*Use t-distribution when constructing confidence intervals based on small samples (n <30) from populations with unknown variance and a normal distribution
t-distributions have the following properties
- Symmetrical
- Has 1 parameter : Degrees of Freedom
- Has more probability in the tails than the normal distribution
- As df increases, the shape of the t-distribution more closely approaches a standard normal distribution
The number of sample observations minus 1, for sample means
Degrees of Freedom (df)
Characteristics of T-Distribution
- Centered at Zero
- Flatter than a normal distribution
- As df increases, shape becomes more spiked and tails become thinner.
- t-test levels of significance only correspond to one tail probabilities
This estimates result in a range of values within which the actual value of a parameter will lie, given the probability of 1-alpha
Confidence Intervals
How confident your estimate is, denoted by alpha
Level of Significance
Confidence Interval Calculation
C.I. = Xbar + z * (sigma / n^(1/2))
Distribution with known variance
Use z score
Distribution with unknown variance
Use t score
2 Limitations of “larger is better”
- may contain observations from a different population
2. Cost
Bias that refers to results where the statistical significance of the pattern is overestimated because the results were found through data-mining.
Data mining Bias
Bias which occurs when some data is systematically excluded from the analysis, usually because of the lack of availability
Sample Selection Bias
*Survivorship bias in mutual funds
Occurs when a study tests a relationship using sample data that was not available on the test date.
Look-ahead bias
Results if the time period over which the data is gathered is either too short or too long.
Time-period Bias