(R10) Sampling and Estimation Flashcards

1
Q

Define Simple Random Sampling and provide two methods

A

Each element has an equal probability of being chosen; 1) random number generate or 2) select every kth element

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling Distribution

A

The distribution of all distinct possible values that a statistic can assume when computed from samples of the same size randomly drawn from the same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling error

A

The difference between the observed value of a statistic and the quantity it is intended to estimate (Sample mean - population mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stratified Random Sampling

A
  • Separate the population into smaller groups based on one more distinguishing characteristics; then use simple random sampling
  • provides more precise mean and variance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Three Data Types

A
  1. Time Series
  2. Cross-Sectional
  3. Panel
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Time Series Data

A

Take a variable or multiple variables and observe how the variables change over a period of time
i.e. Monthly returns on Microsoft stock from Jan 1994 to Dec 2004.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Cross-Sectional Data

A

Multiple observational units at a point in time

i.e Sales for 30 different companies for a particular quarter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Longitudinal Data

A

Observations over time of multiple characteristics of the same entity, such as unemployment, inflation anf GDP growth rates, for a country over 10 years.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Panel Data

A

Time series + cross sectional. Data that contains observations over time of the same characteristic for multiple entities, such as debt/equity ratios for 20 companies over 24 quarters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Standard Error Formula

A

Standard deviation divided by square root of n; the standard deviation of the distribution of the sample means.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Limit Theorem

A

Theorem that states for simple random samples of size n, from a population with a mean u, and a finite variance, sigma^2, the sampling distribution of the sample mean, Xbar, approaches a normal probability distribution with mean u, and a variance equal to sigma^2 / N as the sample size becomes large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Properties of CLT

A
  • If the sample size, n, is sufficiently large (n>= 30), the sampling distribution of the sample means will be approximately normal.
    • The mean of the population, u, and the mean of the distribution of all possible sample means are equal.
  • **The variance of the distribution of sample means is sigma^2 /N. the population variance divided by the sample size.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Desired Properties of an Estimator

A
  1. Unbiasedness - when the expected value of the estimator is equal to the parameter you are trying to estimate.
  2. Efficient – if the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
  3. Consistent - the accuracy of the parameter estimate increases as the sample size increases.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Point Estimates

A

Sample mean and sample variance are point estimates

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Confidence Interval Formula

A

Point estimate +/- reliability factor * standard error

C.I. = Xbar + z * (sigma / n^(1/2))

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Distribution with known variance, which table should be used to create confidence interval?

A

Use Z score

17
Q

Distribution with unknown variance, which table should be used to create confidence interval?

A

Use t score if sample is less than 30; use t or z score if sample is greater than 30

18
Q

Level of Significance

A

How confident your estimate is, denoted by alpha

19
Q

Characteristics of T-Distribution

A
  1. Centered at Zero
  2. Flatter than a normal distribution
  3. As df increases, shape becomes more spiked and tails become thinner.
  4. t-test levels of significance only correspond to one tail probabilities
20
Q

Confidence intervals are affected by:

A
  • z score or t score
  • alpha - level of confidence
  • n - number of samples
21
Q

Data mining bias

A

Bias that refers to results where the statistical significance of the pattern is overestimated because the results were found through data-mining (the practice of hitting a data set over and over again until you hit gold)

22
Q

Sample selection bias

A

Bias which occurs when some data is systematically excluded from the analysis, usually because of the lack of availability (survivorship bias in mutual funds)

23
Q

Look ahead basis

A

Occurs when a study tests a relationship using sample data that was not available on the test date (i.e. stock price/returns vs. accounting data)

24
Q

Time period basis

A

Results only apply for that specific time period

25
Unbiased estimator
When the expected value of the estimator is equal to the parameter you are trying to estimate.
26
Efficient Estimator
If the variance of its sampling distribution is smaller than all the other unbiased estimators of the parameter you are trying to estimate.
27
Consistent Estimator
The accuracy of the parameter estimate increases as the sample size increases.