Sampling and Estimation Flashcards

1
Q

Simple Random Sampling

A

Selection of a sample such that each item of the population has the same likelihood of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic Sampling

A

Selection of every nth member from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling Error

A

The difference between a sample statistic (mean, variance, s-dev) and its corresponding population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling Error of the mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sampling Distribution

A

Probability distribution of all possible sample statistics computed from a set of equal sized samples randomly selected from the same population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Stratified random sampling

A

Use of a classification system to separate the population into smaller groups based on one or more distinguishing characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time-series data

A

Observations taken over a period of time at specific and equally spaced time intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional data

A

Sample of observations taken at a single point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Longitudinal Data

A

Observatiosn over time of multiple characteristics of the same entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Panel data

A

Observations over time of the same characteristic for multiple entities

The size of the samples from each strata is based on the relative size of the strata relative to the population and not necessarily same across population
Classify the population into smaller groups based on one or more distinguishing characteristics
Take a random sample from each subgroup and pooled together.
The size of sample from each subgroup is based on relative of the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Limit Theorem

A

For simple random samples of size n from a population with a mean of m and a finite variance, the sampling distribution of the same mean approaches a normal probability distribution with mean m and a variance equal to variance/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Point estimates

A

Single (sample) values used to estimate population parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Confidence interval

A

Confidence intervals are usually constructed by adding or subtracting an appropriate value from the point estimate

* Point Estimate +_ Reliability factor x Standard Error*

Range of values within which the actual value of a parameter will lie, given the probability of 1 - a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Level of significance

A

α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Degree of confidence

A

1 - α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence interval form

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Desirable properties of an estimator

A

Unbiasedness, efficiency, and consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Desirable properties of an estimator - definitions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Student’s t-distribution

A

Bell-shaped probability distribution that is symmetrical about its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Properties of student’s t-distribution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Confidence interval for the population mean (normal distribution with a known variance)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Commonly used standard normal distribution reliability factors

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Confidence intervals for a population mean that is normal with unknown variance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Criteria for selecting the appropriate test statistic

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Q

Data Mining

A

Occurs when analysts use the same database to search fo rpatterns or trading rules until they discover one that “works.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
26
Q

Data-mining bias

A

Results where the statistical significance of the pattern is overestimated because the results were found through data mining.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
27
Q

Sample selection bias

A

When some data is systematically excluded from the analysis because of lack of availability.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
28
Q

Survivorship bias

A

The most common bias… for example, when funds are no longer included because they have ceased to exist due to closure or merger.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
29
Q

Look-ahead bias

A

When a relationship is tested using sample data that was not availabe on the test date.

For example

Consider a test of a trading rule that is based on the price to book value
Stock Price
Are available for all companies at the same point of time
Book Value
While the year end book values may not be available for all companies until 30 to 60 days after the fiscal year ends

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
30
Q

Time-period bias

A

Occurs when the time period over which the data was gathered is too short or too long.

Too Short 
    Results may reflect phenomenon specific to that time period or perhaps data mining 
Too Large
    The fundamental economic relationships that underlie the results may have changed
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
31
Q

Desirable Properties of Estimator

A

1: - Unbiasedness
2: - Efficient
3: - Consistent

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
32
Q

Unbiasedness

A

An unbiased estimator is one whose expected value equals the parameter it’s intended to estimate.

Expected Value
An unbiased estimator is one for which the expected value of the estimator is equal to parameter you are trying to estimate
For example because the expected value of the sample mean is equal to the population mean the sample mean is an unbiased estimator of the population mean

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
33
Q

Efficiency

Enterprise Value

A

An estimator is efficient if no other estimator has a sample distribution with smaller variance.

An estimator is efficient if the variance of its sampling distribution is smaller the all of other unbiased estimators of the parameter you are trying to estimate

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
34
Q

Consistency

Sample Size
Company Secretary

A

A consistent estimator is one for which the probability if estimates close to the value of the population parameter increases as sample size increases.

A consistent estimator is as you increase the sample size the accuracy of the parameter estimator also increases.

Therefore with an increase in the sample size the standard error of the sample mean also decrease and the sampling distribution bunches more closely around the population mean

As the sample size approaches infinity the standard error approaches zero

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
35
Q

Confidence Interval

A

A 100(1-α)% confidence interval:

Point estimate +/- Reliability factor x Standard error.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
36
Q

Issues Regarding Selection of the Appropriate Sample Size

A

Limitations
1:-Larger samples may contain observations from a different from a different population (distribution)

2:-÷The cost of using a larger sample must be weighted against the value of the increase in the precision from the increase in sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
37
Q

Larger Sample Size Advantages

A

Larger Sample Size Advantages

1: -Reduces sampling error and standard deviation of the sample statistics around its population value
2: -Confidence intervals are narrower when samples are larger and the standard errors of the point estimates of population parameter are less

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
38
Q

Sampling Error

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
39
Q

Size of Samples in Stratified Random Sampling

A

The size of the samples from each strata is based on the relative size of the strata relative to the population and not necessarily same across population

Classify the population into smaller groups based on one or more distinguishing characteristics

Take a random sample from each subgroup and pooled together.
The size of sample from each subgroup is based on relative of the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
40
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
41
Q

What is the probability of confidence interval ?

A

1- alpha

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
42
Q

Alpha and 1 - Alpha

A

Confidence Interval estimates result in a range of values within which the actual of a parameter will lie given the probability of 1-alpha

Here alpha is called the level of significance for confidence interval

And the probability 1-alpha is referred to as the degree of confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
43
Q

Normal Distribution Classification

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
44
Q

Non Normal Distribution Classification

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
45
Q

Normal Distribution

Known Variance

Small Sample Size

A

Z Statistics

Interpretation

Probabilistic
After repeatedly taking samples of CFA candidates administering the practice exam and constructing confidence intervals for each sample’s mean 99% of the resulting confidence intervals will in the long run include the population mean

Practical
We are 99% Confident that the population mean score is between 73.5 and 86.45 for candidates from this population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
46
Q

Normal Distribution

Known Variance

Large Sample Size

A

Z Statistics

Interpretation

Probabilistic
After repeatedly taking samples of CFA candidates administering the practice exam and constructing confidence intervals for each sample’s mean 99% of the resulting confidence intervals will in the long run include the population mean

Practical
We are 99% Confident that the population mean score is between 73.5 and 86.45 for candidates from this population

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
47
Q

Normal Distribution

UnKnown Variance

Small Sample Size

A

t-Statisitcs

Owing to the relatively fatter tails of the t-distribution confidence intervals constructed using reliability t-reliability factors will be more conservative (wider) than those constructed using z-reliability factors

Unlike the standard normal distribution the reliability factor for t-distribution depend on the sample size so we can’t rely on commonly used set if reliability factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
48
Q

Normal Distribution

Unknown Variance

Large Sample Size

A

t-statistics

Owing to the relatively fatter tails of the t-distribution confidence intervals constructed using reliability t-reliability factors will be more conservative (wider) than those constructed using z-reliability factors

Unlike the standard normal distribution the reliability factor for t-distribution depend on the sample size so we can’t rely on commonly used set if reliability factors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
49
Q

Non Normal Distribution

Known Variance

Small Sample Size

A

NA

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
50
Q

Non Normal Distribution

Known Variance

Large Sample Size

A

Z Statistics

If the distribution is non normal but the population variance is known the z statistics can be used as long as the sample size is large n is greater than 30
We do this because central limit theorem assures us that the distribution of the sample mean is approximately normal when the sample is large

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
51
Q

Non Normal Distribution

UnKnown Variance

Small Sample Size

A

NA

52
Q

Non Normal Distribution

UnKnown Variance

Large Sample Size

A

t-Statistics

It is also acceptable to use the z-statistics although use of the t-statistics is more conservative

If the distribution is non normal but the population variance is known the t-statistics can be used as long as the sample size is large n is greater than 30

53
Q

Student’s t-distribution

A

It is a bell shaped probability distribution that is symmetrical about its mean

It is appropriate distribution to use when constructing confidence intervals based on small samples (n

    It may also be appropriate to use the t-distribution when population variance is unknown and sample size is large enough that the central limit theorem will assure that the sample distribution is approximately normal
54
Q

Student’s t-distribution

Properties

A

**Properties **

1: - Symmetrical
2: -Defined by single parameter the degrees of freedom where the degrees of freedom are equal to the number of sample observation minus one for sample mean
3: -It has more probability in the tails (fatter tails) than normal distribution
4: -As the degrees of freedom (the sample size ) the shape of the t-distribution more closely approaches a standard normal distribution

55
Q

Standard Error

A
56
Q

Standard Deviation is Equal to

A
57
Q
A
58
Q
A
59
Q
A
60
Q
A
61
Q
A
62
Q
A
63
Q
A
64
Q
A
65
Q
A
66
Q
A
67
Q
A
68
Q
A
69
Q
A
70
Q
A
71
Q
A
72
Q
A
73
Q
A
74
Q
A
75
Q
A
76
Q
A
77
Q
A
78
Q
A
79
Q
A
80
Q
A
81
Q
A
82
Q
A
83
Q
A
84
Q
A
85
Q
A
86
Q
A
87
Q
A
88
Q
A
89
Q

Standard Error of the Sample Mean

A

The

Positive Square Root

of the

Variance

of the

Sample Statistics

90
Q

What does Standard Error Truely Signify

A
91
Q

Standard Deviation Formula for Square Root of Sum of Squared Deviation to the Mean

A
92
Q

Time Period Bias

A

Can result if the time period over which the data is gathered is either too short or too large

Too Short:- Results may reflect phenomenon specific to that time period or perhaps data mining

93
Q

When to use sample standard deviation and when to use population standard deviation in calculating standard error

A

When population standard deviation is given use that to calculate standard error but when it’s not given (population standard deviation) than use sample standard deviation.

94
Q

How to identify whether to use t-distribution or z - distribution

A

See the sample size
See what is given standard deviation .
Sample or population

95
Q

Formula Confidence Interval for Bernoulli Trial

A
96
Q

Formula for Width of Confidence Interval

A
97
Q

Properties of Central Limit Theorem

A
98
Q
A
99
Q
A
100
Q
A
101
Q
A
102
Q
A
103
Q
A
104
Q
A
105
Q
A
106
Q
A
107
Q
A
108
Q
A
109
Q
A
110
Q
A
111
Q
A
112
Q
A
113
Q
A
114
Q
A
115
Q
A
116
Q
A
117
Q
A
118
Q
A
119
Q
A
120
Q
A
121
Q
A
122
Q
A
123
Q
A
124
Q
A
125
Q
A
126
Q
A
127
Q
A