Sampling and Estimation Flashcards

1
Q

Simple Random Sampling

A

Selection of a sample such that each item of the population has the same likelihood of being included in the sample.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Systematic Sampling

A

Selection of every nth member from a population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Sampling Error

A

The difference between a sample statistic (mean, variance, s-dev) and its corresponding population parameter.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sampling Error of the mean

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Sampling Distribution

A

Probability distribution of all possible sample statistics computed from a set of equal sized samples randomly selected from the same population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Stratified random sampling

A

Use of a classification system to separate the population into smaller groups based on one or more distinguishing characteristics.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Time-series data

A

Observations taken over a period of time at specific and equally spaced time intervals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Cross-sectional data

A

Sample of observations taken at a single point in time.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Longitudinal Data

A

Observatiosn over time of multiple characteristics of the same entity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Panel data

A

Observations over time of the same characteristic for multiple entities

The size of the samples from each strata is based on the relative size of the strata relative to the population and not necessarily same across population
Classify the population into smaller groups based on one or more distinguishing characteristics
Take a random sample from each subgroup and pooled together.
The size of sample from each subgroup is based on relative of the group

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Central Limit Theorem

A

For simple random samples of size n from a population with a mean of m and a finite variance, the sampling distribution of the same mean approaches a normal probability distribution with mean m and a variance equal to variance/n

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Point estimates

A

Single (sample) values used to estimate population parameters.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Confidence interval

A

Confidence intervals are usually constructed by adding or subtracting an appropriate value from the point estimate

* Point Estimate +_ Reliability factor x Standard Error*

Range of values within which the actual value of a parameter will lie, given the probability of 1 - a

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Level of significance

A

α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Degree of confidence

A

1 - α

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Confidence interval form

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Desirable properties of an estimator

A

Unbiasedness, efficiency, and consistency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

Desirable properties of an estimator - definitions

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

Student’s t-distribution

A

Bell-shaped probability distribution that is symmetrical about its mean.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

Properties of student’s t-distribution

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Confidence interval for the population mean (normal distribution with a known variance)

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

Commonly used standard normal distribution reliability factors

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

Confidence intervals for a population mean that is normal with unknown variance

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
24
Q

Criteria for selecting the appropriate test statistic

A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
25
Data Mining
Occurs when analysts use the same database to search fo rpatterns or trading rules until they discover one that "works."
26
Data-mining bias
Results where the statistical significance of the pattern is overestimated because the results were found through data mining.
27
Sample selection bias
When some data is systematically excluded from the analysis because of lack of availability.
28
Survivorship bias
The most common bias... for example, when funds are no longer included because they have ceased to exist due to closure or merger.
29
Look-ahead bias
When a relationship is tested using sample data that was not availabe on the test date. For example Consider a test of a trading rule that is based on the price to book value Stock Price Are available for all companies at the same point of time Book Value While the year end book values may not be available for all companies until 30 to 60 days after the fiscal year ends
30
Time-period bias
Occurs when the time period over which the data was gathered is too short or too long. Too Short Results may reflect phenomenon specific to that time period or perhaps data mining Too Large The fundamental economic relationships that underlie the results may have changed
31
Desirable Properties of Estimator
1: - Unbiasedness 2: - Efficient 3: - Consistent
32
Unbiasedness
An unbiased estimator is one whose expected value equals the parameter it's intended to estimate. Expected Value An unbiased estimator is one for which the expected value of the estimator is equal to parameter you are trying to estimate For example because the expected value of the sample mean is equal to the population mean the sample mean is an unbiased estimator of the population mean
33
Efficiency Enterprise Value
An estimator is efficient if no other estimator has a sample distribution with smaller variance. ## Footnote An estimator is efficient if the variance of its sampling distribution is smaller the all of other unbiased estimators of the parameter you are trying to estimate
34
Consistency Sample Size Company Secretary
A consistent estimator is one for which the probability if estimates close to the value of the population parameter increases as sample size increases. A consistent estimator is as you increase the sample size the accuracy of the parameter estimator also increases. Therefore with an increase in the sample size the standard error of the sample mean also decrease and the sampling distribution bunches more closely around the population mean As the sample size approaches infinity the standard error approaches zero
35
Confidence Interval
A 100(1-α)% confidence interval: Point estimate +/- Reliability factor x Standard error.
36
Issues Regarding Selection of the Appropriate Sample Size
Limitations 1:-Larger samples may contain observations from a different from a different population (distribution) 2:-÷The cost of using a larger sample must be weighted against the value of the increase in the precision from the increase in sample size
37
Larger Sample Size Advantages
Larger Sample Size Advantages 1: -Reduces sampling error and standard deviation of the sample statistics around its population value 2: -Confidence intervals are narrower when samples are larger and the standard errors of the point estimates of population parameter are less
38
Sampling Error
39
Size of Samples in Stratified Random Sampling
The size of the samples from each strata is based on the relative size of the strata relative to the population and not necessarily same across population Classify the population into smaller groups based on one or more distinguishing characteristics Take a random sample from each subgroup and pooled together. The size of sample from each subgroup is based on relative of the group
40
41
What is the probability of confidence interval ?
1- alpha
42
Alpha and 1 - Alpha
Confidence Interval estimates result in a range of values within which the actual of a parameter will lie given the probability of 1-alpha Here alpha is called the level of significance for confidence interval And the probability 1-alpha is referred to as the degree of confidence
43
Normal Distribution Classification
44
Non Normal Distribution Classification
45
Normal Distribution Known Variance Small Sample Size
Z Statistics Interpretation **Probabilistic** After repeatedly taking samples of CFA candidates administering the practice exam and constructing confidence intervals for each sample's mean 99% of the resulting confidence intervals will in the long run include the population mean **Practical** We are 99% Confident that the population mean score is between 73.5 and 86.45 for candidates from this population
46
Normal Distribution Known Variance Large Sample Size
Z Statistics Interpretation ***Probabilistic*** After repeatedly taking samples of CFA candidates administering the practice exam and constructing confidence intervals for each sample's mean 99% of the resulting confidence intervals will in the long run include the population mean ***Practical*** We are 99% Confident that the population mean score is between 73.5 and 86.45 for candidates from this population
47
Normal Distribution UnKnown Variance Small Sample Size
***t-Statisitcs*** Owing to the relatively fatter tails of the t-distribution confidence intervals constructed using reliability t-reliability factors will be more conservative (wider) than those constructed using z-reliability factors Unlike the standard normal distribution the reliability factor for t-distribution depend on the sample size so we can't rely on commonly used set if reliability factors
48
Normal Distribution Unknown Variance Large Sample Size
***t-statistics*** Owing to the relatively fatter tails of the t-distribution confidence intervals constructed using reliability t-reliability factors will be more conservative (wider) than those constructed using z-reliability factors Unlike the standard normal distribution the reliability factor for t-distribution depend on the sample size so we can't rely on commonly used set if reliability factors
49
Non Normal Distribution Known Variance Small Sample Size
**NA**
50
Non Normal Distribution Known Variance Large Sample Size
***Z Statistics*** If the distribution is non normal but the population variance is known the z statistics can be used as long as the sample size is large n is greater than 30 We do this because central limit theorem assures us that the distribution of the sample mean is approximately normal when the sample is large
51
Non Normal Distribution UnKnown Variance Small Sample Size
***NA***
52
Non Normal Distribution UnKnown Variance Large Sample Size
***t-Statistics*** It is also acceptable to use the z-statistics although use of the t-statistics is more conservative If the distribution is non normal but the population variance is known the t-statistics can be used as long as the sample size is large n is greater than 30
53
Student's t-distribution
It is a bell shaped probability distribution that is symmetrical about its mean It is appropriate distribution to use when constructing confidence intervals based on small samples (n It may also be appropriate to use the t-distribution when population variance is unknown and sample size is large enough that the central limit theorem will assure that the sample distribution is approximately normal
54
Student's t-distribution Properties
***Properties *** 1: - Symmetrical 2: -Defined by single parameter the degrees of freedom where the degrees of freedom are equal to the number of sample observation minus one for sample mean 3: -It has more probability in the tails (fatter tails) than normal distribution 4: -As the degrees of freedom (the sample size ) the shape of the t-distribution more closely approaches a standard normal distribution
55
Standard Error
56
Standard Deviation is Equal to
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
Standard Error of the Sample Mean
The **_Positive Square Root_** _of the_ **_Variance_** of the Sample Statistics
90
What does Standard Error Truely Signify
91
Standard Deviation Formula for Square Root of Sum of Squared Deviation to the Mean
92
Time Period Bias
Can result if the time period over which the data is gathered is either too short or too large Too Short:- Results may reflect phenomenon specific to that time period or perhaps data mining
93
When to use sample standard deviation and when to use population standard deviation in calculating standard error
When population standard deviation is given use that to calculate standard error but when it's not given (population standard deviation) than use sample standard deviation.
94
How to identify whether to use t-distribution or z - distribution
See the sample size See what is given standard deviation . Sample or population
95
Formula Confidence Interval for Bernoulli Trial
96
Formula for Width of Confidence Interval
97
Properties of Central Limit Theorem
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127