Reading 11: Sampling and Estimation Flashcards

1
Q

Simple Random Sampling

A

Method of selecting a sample where each variable has the same likelihood of being included e.g. drawing names out of a hat

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Sampling Error

A

Sampling Error = sample mean - population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling Distribution

A

Distribution of all values that a sample statistic can take on when computed from samples of identical size randomly drawn from the same population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Stratified Random Sampling uses a…

A

classification system to separate the population into smaller groups based on one or more distinguishing characteristics

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Time-Series Data

A

Observations taken over a period of time at specific and equally spaced intervals

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cross Sectional Data

A

Sample of observations taken at a single point in time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Longitudinal Data

A

Observations over time of multiple characteristics of the same entity (think country)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Panel Data

A

Observations over time of the same characteristics over multiple entities

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Central Limit Theorem (Definition)

A

The larger the same size, (>30) the closer the sample gets to normal distribution. The means of the population and sample will be equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Central Limit Theorem (Variance Formula)

A

SD^2 / n

n = sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Standard error of the sample mean (definition)

A

Standard deviation around the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Standard error of the sample mean (If population Sd known) (formula)

A

Standard Error = SD of population / Square Root of n

n = size of sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Standard error of the sample mean (if population Sd unknown) (formula)

A

S / Square Root of N
Where:
S = SD of the sample

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Desirable Properties of an Estimator

A
  1. Unbiasedness
  2. Efficiency
  3. Consistency
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Unbaisedness

A

EV of estimator = EV of parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Efficiency

A

Lower sampling error than any other unbiased estimator

17
Q

Consistency

A

Variance of sampling error will decrease with sample size e.g. it will become more accurate

18
Q

Point Estimate

A

Single sample values used to estimate population parameters (mean)

19
Q

Confidence Interval

A

Range of values in which the population parameter is expected to lie

20
Q

Confidence interval (Formula)

A

Point Estimate + / - (reliability factor x standard error)

21
Q

Student’s t distribution

A

Distribution to use when constructing confidence intervals based on:

  1. Small sample sizes (n < 30)
  2. unknown variance
  3. normal distribution
22
Q

t - distribution (properties)

A
  1. Symmetrical
  2. Defined by degrees of freedom (n- 1) (df)
  3. Fatter tails
23
Q

Degrees of Freedom (df) (Affect on distribution)

A

More df results in a greater percentage of observations near the center of the distribution

Confidence Intervals will be narrower

24
Q

Confidence Interval Formula (Known Variance)

A

Point Estimate + (reliability factor x standard error)

25
Q

Confidence Interval Formula (Unknown Variance)

A

Point Estimate + (t-reliability factor x standard error of sample mean)

26
Q

Confidence Interval Formula (large sample unknown variable)

A

Point Estimate + (Z Statistic x standard error)

27
Q

Limitations of large sample size

A
  1. Observations from different population/distribution

2. Cost (Must be weighed against increase in precision)

28
Q

Data-Mining Limitation

A

Repeated use of same database to search for patterns

29
Q

Data-Mining Bias

A

Overstating the statistical significance of a pattern because a the data was derived through data mining

30
Q

Sample Selection Bias

A

Systematic exclusion of some data which renders the sample non-random

31
Q

Survivorship Bias

A

Excluding badly performing funds

32
Q

Look-ahead bias

A

Testing relationship when the data was not available at the test date

33
Q

Time-period bias

A

Data gather is either too short or too long