Quant: Sampling and Estimation Flashcards

You may prefer our related Brainscape-certified flashcards:
1
Q

Simple Random Sampling =

A

Randomly choosing items from a population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic Sampling =

A

Drawing every nth member of the population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling Error =

A

Difference between a sample statistic and a population parameter

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling Distribution =

A

Distribution of the statistics drawn from the samples - if we repeat the sampling process and come up with a number of different sample means, thos sample means will themselves have a distribution.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Stratified Random Sampling =

A

The population is split into groups (stratum) and samples are taken from those groups (based on their relative weighting within the entire population)

AS OPPOSED TO SIMPLE RANDOM SAMPLING

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Cross sectional/time series/panel data/ longitudinal data =

A

Time series is over a period of time at equal intervals.

Cross sectional is at a point in time

Longitudinal data is multiple data points over time for the same entity.

Panel data is a specific data point over time for different entities.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Central Limit Theorem =

A

The central limit theorem states that for simple random samples of size: n from a population with a mean µ and a finite variance σ2, the sampling distribution of the
sample mean x approaches a normal probability distribution with mean µ and a variance equal to σ2/n as the sample size becomes large.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Central Limit Theorem (2) Important Points =

A

We can make inferences about the population mean from the sample mean, regardless of the population’s distribution, if sample size n> (or equal to) 30.

ie N>30 MEANS SAMPLING DISTRIBUTION WILL BE APPROXIMATELY NORMAL which means we can do hypothesis testing and construct confidence intervals.

The mean of the population, µ, is equal to the mean of the distribution of all possible sample means.

Variance of the distribution of sample means is σ2/n.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Standard Error (of the sample mean) =

A

standard deviation of the distribution of the sample means.

When the SD of the population is known (see formula)

However population SD is not normally known, in which case we use s instead of σ.

AS σ OR s INCREASES, STANDARD ERROR INCREASES.

AS n, SAMPLE SIZE, INCREASES, STANDARD ERROR DECREASES.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Desirable properties of an estimator =

A

Unbiasedness: expected value of the estimator is equal to the parameter you’re trying to estimate.

Efficient: variance of the sampling distribution is the least out of all unbiased estimators (ie has the lowest sampling error)

Consistent: accuracy of the parameter estimate increases as sample size increases (as n, sample size increases, the standard error decreases and the sampling distribution bunches around the population mean)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Point estimate vs confidence interval estimate =

A

A point estimate is a single value used to estimate a parameter. The sample mean is a point estimate for the population mean.

A confidence interval is a range of values in which the paramater is expected to lie.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When to use a t-distribution =

A

SMALL SAMPLE, <30, from populations with unknown variance and a normal/approx normal distribution.

May also be appropriate when variance is unknown and sample size is large enough that the central limit theorem will assure that the sampling distribution is approximately normal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

t-distribution characteristics =

A

Symmetrical.

Defined by a single parameter, degrees of freedom, where the degrees of freedom are equal to the number of sample observations minus 1 for sample means.

Fatter tails than a normal distribution.

As the df gets larger the shape becomes closer to a normal distribution.

THE SHAPE CHANGES AS YOU HAVE MORE OBSERVATIONS AND DF CHANGES

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Degrees of confidence/level of significance =

A

Degree of confidence = 1 - ALPHA, where alpha is the level of significance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Confidence interval (normally distributed and has a known variance) =

A

point estimate +- (reliability factor x standard error)

For a population that is normally distributed and has a known variance:

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Reliability factors (constructing confidence intervals) =

A

defined by ALPHA, the level of significance. Is a z-score for which ALPHA/2 probability is in the upper tail of the distribution.

Most common reliability factors are:

  • 1.645, for 90% confidence interval (ALPHA =10%, 5% in each tail)
  • 1.960, for 95% confidence interval
  • 2.575, for 99% confidence interval
17
Q

Interpreting confidence intervals =

A

Probabilistic: ie having taken multiple samples of the population and constructed confidence intervals for each sample’s mean, 99% of the resulting confidence intervals will include the population mean.

Practical : we are 99% confident that the population mean score is between x and y for candidates from this population (where x and y are the lower and upper limits of the confidence interval)

18
Q

Confidence intervals for the population mean (normal with unknown variance) =

A

Owing to the fatter tails of t-distributions, confidence intervals using t distr. will be WIDER than those constructed using z distr.

19
Q

Confidence interval for a population mean (unknown variance, large sample, any distribution) =

A

t stat can be used as long as sample size is large, >30. Can use z, althought t is more conservative.

IF SAMPLE SIZE OF NONNORMAL DISTR IS LESS THAN 30 WE CANNOT CONSTRUCT A CONFIDENCE INTERVAL.

20
Q

Confidence interval for population mean (population variance known, non normal distr) =

A

can use z stat as long as n>30 (large sample size). Central limit theorem assures us that the distribution of the sample mean is approximately normal when the sample is large.

21
Q

CHEAT CARD, CONFIDENCE INTERVALS =

A
22
Q

Does our sample need to be random?

A

YES. All of these metrics and analysis rely on the sample being random/unbiased

23
Q

Benefits/limitations of sample size?

A

Larger samples reduce sampling error and standard deviation of a sample around its population/true value. Confidence intervals are narrower when samples are larger. Standard errors of point estimates or population parameters are less.

However, larger samples have these issues:

  • We may include points in our sample from another population (with different parameters) reducing the precision of population estimates
  • Cost.

Both suggest that ‘larger is better’ is not necessarily the case.

24
Q

Data-mining =

A

Use of the same database repeatedly to search for patterns. Can be argued that this leads to overstating the significance of relationships when they are found (data mining bias)

Points to look out for when reading about a profitable trading strategy:

  • Many variables tested, only those discovered to be significant reported
  • Lack of economic theory consistent with the results
25
Q

Sample selection bias =

A

occurs when some: data is systematically excluded from the analysis, usually because of the lack of availability.

This practice renders the observed sample to be nonrandom, and any conclusions drawn from this sample can’t be applied to the population because the observed sample and the portion of the population that was not observed are different.

26
Q

Survivorship bias =

A

Survivorship bias is the most common form of sample selection bias. A good example of the existence of survivorship bias in investments is the study of mutual fund performance. Most mutual fund databases, like Morningstar’s, only include funds currently in existence-the “survivors.” They do not include funds that have ceased to exist due to closure or merger.

THE OBSERVED SAMPLE AND THE PART OF THE POPULATION NOT OBSERVED ARE DIFFERENT.

27
Q

Look-ahead bias =

A

Look-ahead bias occurs when a study tests a relationship using sample data that was not available on the test date. For example, consider the test of a trading rule that is based on the price-to-book ratio at the end of the fiscal year. Stock prices are available for all companies at the same point in time, while end of year book values may not be available 30 to 60 days after the fiscal year ends.

In order to account for this bias, a study
until that uses price-to-book value ratios to test trading strategies might estimate the book value as reported at fiscal year end and the market value two months later.

28
Q

Time-period bias =

A

Time-period bias can result if the time period over which the data is gathered is either too short or too long. If the time period is too short, research results may reflect phenomena specific to that time period, or perhaps even data mining. If the time period is too long, the fundamental economic relationships that underlie the results may have changed.