Reading 10 LOS's Flashcards

1
Q

LOS 10a: define simple random sampling and a sampling distribution

LOS 10b: Explain sampling error

A

In a simple random sample, each member of the population has the same probability or likelihood of being included in the sample. In practice, random samples are generated using random number tables or computer random-number generators. Systematic sampling is often used to generate approximately random samples. In systematic sampling, every kth member in the population list is selected until the desired sample size is reached.

Sampling Error

Is the error caused by observing a sample instead of the entire population to draw conclustions relating to population parameters.

  • Sampling error of the mean = Sample mean - Population mean

Sampling Distribution

This is the probability distribution of a given sample statistic under repeated sampling of the population. By repeating the sampling of the population, we will get different means with each sample. The distribution of these sample means is called the sampling distribution of the mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

LOS 10c: distinguish between simple random and stratified random sampling

A

Stratification is the process of grouping members of the population into relatively homogeneous subgroups, or strata, before drawing samples. The strata should be mutually exclusive and collectively exhaustive. Once this is accomplished, random sampling is applied within each stratum and the number of observations drawn from each stratum is based on the size of the stratum relative to the population. This often improves the representativeness of the sample by reducing the sampling error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

LOS 10 d: Distinguish between time-series and cross-sectional data

A

Time-series data consists of observations measured over a period of time, spaced at uniform intervals.

Cross-Sectional data refers to data collected by observing many subjects at the same point in time.

Data can have both time-series and cross-sectional data in them. Examples:

  • Longitudinal data, which is data collected over time about multiple characteristics of the same observational unit. (used for various economic indicators)
  • Panel Data refers to data collected over time about a single characteristic of multiple observational units.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

LOS 10e: Explain the central limit theorem and its importance

A

The central limit theorem allows use to make accurate statements about the population mean and variance using the sample mean and variance regardless of the distribution of the population, as long as the sample size is adequate, normally defined as more than 30.

The important properties of the central limit theorem are:

  • Given a population with any probability distribution, the sampling distribution will approximately be normal with the population mean and variance
  • No matter what the distribution of the population, for a sample whose size is greater than or equal to 30, the sample mean will be normally distributed
  • The mean of the population and the mean of the distribution of sample means are equal
  • The variance of the distribution of sample means is the population variance divided by sample size
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

LOS 10f: Calculate and interpret the standard error of the sample mean

A

The standard deviation of the distribution of sample means is known as the standard error of the statistic

When population variance σ2, is known, the standard error of sample mean is calculated as:

  • σx= σ / square root of n

practically speaking, population variances are almost never known, so we estimate the standard error of the sample mean using the sample’s standard deviation

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

LOS 10h: Distinguish between a point estimate and a confidence interval estimate of a population parameter.

A

A point estimate involves the use of sample data to calculate a single value that serves as an approximation for an unknown population parameter. For example the sample mean is a point estimate for the population mean. The formula used to calculate a point estimate is known as an estimator and is given as:

  • x bar = Σx / n

A confidence interval uses sample data to calculate a range of possible values that an unkown population parameter can take, with a given probability of (1-a) , where a is called the level of significance, and (1-a) refers to the degrees of confidence

A confidence intervale has the following structure:

  • Point estimate +/- (realiability factor x standard error)
  • where realiability factor is a number based on the assumed distribution of the point estimate and level of confidence for interval (1-a). For 95% its 1.96, for 99% its 2.58
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

LOS 10g: Identify and describe desirable properties of an estimator

A

Unbiasedness - an unbiased estimator is one whose expected value is equal to the parameter being estimated. The expected value of the sample mean equals the population mean. Therefore sample mean is unbiased estimator or population mean

Efficiency- an efficient unbiased estimator is one that has the lowest variance among all unbiased estimators of the same parameter

Consistency a consistent estimator is one for which the probability of estiamtes close to the value of the population parameter increases as sample size increases.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

LOS 10i: Describe the properties of Student’s t-distribution and calculate and interpret its degrees of freedom

A

Student’s t-distribution is a bell-shaped probability distribution with the following properties:

  • it is symmetrical
  • it is defined by a single parameter, the degrees of freedom, where degrees of freedome equal sample size minus 1
  • It has a lower peak than the normal curve, but fatter tails
  • As the degrees of freedom increase, the shape of the t-distribution approaches the shape of the standard normal curve

The t-distribution is used in the following scenarios:

  • it is used to construct confidence intervals for normally distributed population whose variance is unkown when the sample size is small
  • it may be used for a non-normally distributed population whose vairance is unknown if the sample size is large
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

LOS 10j: calculate and interpret a confidence interval for a population mean, given a normal distribution with 1)a known population variance 2) an unknown population/ variance, or 3) an unknown variance and a large sample size

A

The confidence interval for the population mean when the population follows a normal distribution and its variance is known is calculated as:

  • xbar +/- za/2 σ/square root of n
  • where xbar= sample mean
  • za/2= standard normal random variable for which the probability of an observation lying in either tail is a/2 (reliability factor)
  • σ/square root of n= standard error of the sample mean

When the varianceof a normally distributed population is not known, we use the t-distribution to construct confidence intervals

  • xbar +/- ta/2 s/ square root of n

When the population is normally distributed we :

  • use the z-stat when the population variance is known
  • use the t-stat when population variance is unknown

When the distribution of the population is nonnormal, the construction of an appropriate confidence interval depends on the sample size

  • If the population variance is known and the sample size is large, we use z-stat.
  • If the population variance is not known and sample size is large, we can use the z-stat or t-stat. However here the t-stat is encouraged
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

LOS 10k: Describe the issues regarding selection of the appropriate sample size, data-mining bias, sample selection bias, survivorship bias, look-ahead bias, and time-period bias

A

We know that as sample size increases, standard error decreases, so this would be considered good. However, in practice two considerations may work against increasing sample size:

  1. Increasing the size of the sample may result in drawing observations from a different population
  2. Increasing the sample size may involve additional expenses that outweight the benefit of increased accuracy of estimates

Types of Biases

Data mining is the practice of developing a model by extensively searching through a data set for statistically significant relationships until a pattern that works is discovered. Given that a lot of hypothesis are tested, its virtually certain that some of them will appear to be highly stat significant, even on a data set with no real correlations at all.

This most commonly occurs when:

  • researchers have not formed a hypothesis in advance, and are therefore open to any hypothesis susggested by data
  • When researchers narrow the data used in order to reduce the probability of the sample refuting a specific hypothesis

Warning signs that data mining bias might exist are:

  • Too much digging warning sign which involves testing numerous variables until one that appears to be significant is discovered
  • No story/ no future warning sign, which is indicated by a lack of economic theory that can explain empirical results

Sample Selection Bias results from the exclusion of certain assets from a study due to the unavailability of data

Some data bases may suffer from survivorship bias.

sample selection bais is most severe in studies of hedge fund returns, since they are not required to publicly post returns, they only post the best ones

Look-Ahead Bias arises when a study uses information that was not available on the test date, example estimates of of revenues

Time-Period Bias arises if a test is based on a certain time period, which may make the results obtained from the study time-period specifict

How well did you know this?
1
Not at all
2
3
4
5
Perfectly