Lecture 4 (SAMPLING AND SAMPLING DISTRIBUTIONS) Flashcards

1
Q

SAMPLING

A

A means for gathering useful information about a population - information gathered and conclusions drawn

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the advantage of sampling over census?

A

Sampling saves money and time
Research process is sometimes destructive so can save product.
It is the only option when accessing a population is impossible.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the reasons for taking a census over a sample?

A

Eliminates the possibility that a random sample is not representative of the population.
The person authorising the study is uncomfortable with sample information.
Safety of customer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

POPULATION FRAME

A

A list, map, directory, or other source used to represent the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

OVER REGISTRATION (population frame)

A

The frame contains all members of the target population and some additional elements.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

UNDER REGISTRATION (population frame)

A

The frame does not contain all members of the target population.
The goal is to minimise differences between target population and frame.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

RANDOM SAMPLING

A

Every unit of the population has the same probability of being included in the sample.
A chance mechanism is used in the selection process.
Eliminates bias
Also known as probability sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

NON-RANDOM SAMPLING

A

Every unit of the population does not have the same probability of being included in the sample.
Open to selection bias
Not appropriate data collection method for most statistical methods
Non-probability sampling.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the 4 random sampling techniques?

A

Simple Random Sampling
Stratified Random Sampling
Systematic Random Sampling
Cluster (or Area) Sampling

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

SIMPLE RANDOM SAMPLE

A

Basis for other random sampling techniques
Each unit is numbered 1 to n
A random number generator can be used to select n items from the sample.
Easier to perform for small populations
Cumbersome for large populations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

STRATIFIED RANDOM SAMPLE

A

Population is divided into non-overlapping sub populations called strata
A random sample is selected from each stratum
Proportionate (% of the sample taken from each stratum is proportionate to the % that each stratum is within the whole population)
Disproportionate (when the % of the sample taken from each stratum is not proportionate to the % that each stratum is within the whole population.

has the potential to match the sample closely to the population
Stratified sampling is more costly
Stratum should be relatively homogeneous (i.e. race, gender, religion

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

SAMPLING ERROR

A

A sample does not represent the population.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

SYSTEMATIC RANDOM SAMPLING

A

Convenient and relatively easy to administer.
Population elements are an ordered sequence (at least conceptually)
The first sample element is selected randomly from the first k population elements.
Thereafter, sample elements are selected at a constant interval, k, from the ordered sequence frame.
k = N/n
n = sample size
N = population size
k = size of selection interval

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Advantages/Disadvantages of systematic sampling

A

Ad: Convenience
Speed
Evenly distributed sampling across frame.
Dis: It is biased if the samples are ranked.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

CLUSTER (AREA) SAMPLING

A

Population is divided into non-overlapping clusters or areas
Each cluster is a miniature, or microcosm, of the population.
A subset of the clusters is selected randomly for the sample
If the number of elements in the subset of clusters is larger than the desired value of n, these clusters may be subdivided to form a new set of clusters and subjected to a random selection process.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

ADVANTAGES OF CLUSTER SAMPLING

A

More convenient for geographically dispersed populations.
Reduced travel costs to contact sample elements.
Simplified administration of the survey.
Unavailability of sampling frame prohibits using other random sampling methods.

17
Q

DISADVANTAGES OF CLUSTER SAMPLING

A

Statistically less efficient when the cluster elements are similar.
Costs and problems of statistical analysis are greater than for simple random sampling.

18
Q

NON-SAMPLING ERRORS

A

All errors other than sampling errors.
e.g.
Missing data, recording, data entry, and analysis errors.
Poorly conceived concepts, unclear definitions, and defective questionnaires.
Response errors occur when people do not know, will not say, or overstate in their answers.

19
Q

CENTRAL LIMIT THEOREM

A

Allows one to study populations with differently shaped distributions.
Creates the potential for applying the normal distribution to many problems when sample size is sufficiently large.
As sample size increases, the distribution narrows.
Due to the std dev of the mean.
Std Dev of mean decreases as sample size increases.

20
Q

What are the advantages of the central limit theorem

A

Advantage when sample data is drawn from populations not normally distributed or populations of unknown shape can also be analysed because the sample means are normally distributed due to large sample sizes.

21
Q

Properties of the Central Limit Theorem - sampling from a normal population

A

For sufficiently large sample sizes (n>30)
The distribution of sample means Xbar, is approximately normal;
The mean of this distribution is equal to u, the population mean.
Its standard deviation is o/sqrt(n)
regardless of the shape of the population distribution

22
Q

Z formula for sample means

A

z = (Xbar - u) / (o/sqrt(n))

23
Q

POINT ESTIMATE

A

A static taken from a sample that is used to estimate a population parameter.
xbar = Σx/n

24
Q

INTERVAL ESTIMATE

A

A range of values within which the analyst can declare, with some confidence, the population lies; also known as the confidence interval.
xbar +/- zɑ/2 * o/sqrt(n)

25
Q

What is aloha for a 95% confidence interval?

A

0.05

ɑ/2 = 0.025

26
Q

How do you find the z value from a confidence interval?

A

E.g. value of ɑ/2 or z0.025 look at table under:
0.500 - 0.0250 = 0.4750
look up 0.4750 and read 1.96 as the z value from the row and column.

27
Q

What is the purpose of the confidence interval?

A

Yields a range within which the researchers feel with some confidence the population mean is located.
If a researcher were to randomly select 100 samples of size n and use the results of each sample to construct 95% confidence intervals, approximately 95% out of 100 would contain the population mean.

28
Q

What is used when the sample size is > 5% of the population?

A

use the finite population correction factor.

Formula in booklet.

29
Q

Estimating the population mean using the z statistic when the sample size is small.

A

The central limit theorem applies only when the sample size is large (n>30)
However, the z formuals can still be used when the sample size is small (n<30) if it is known that the population from which the sample is drawn is normally distributed.

30
Q

How do you estimate the mean of a normal population when the standard deviation is unknown?

A

The population has a normal distribution.
The value of the population Std Dev is unknown; the sample Std Dev must be used in the estimation process.
z distribution is not appropriate for these conditions, t distribution is appropriate, and you use the sample Std Dev in the t formula.

31
Q

t DISTRIBUTION

A

A family of distributions - a unique distribution for each value of its parameter, degrees of freedom (d.f.)
Symmetrical, unimodal, mean = 0, flatter than z

t = (xbar - u) / (s/sqrt(n))

t distributions approach the normal curve as n becomes larger.

32
Q

How is the t distribution read?

A

T table uses the area in the tail of the distribution.
Emphasis is on ɑ, and each tail of the distribution contains ɑ/2 of the area under the curve when confidence intervals are constructed.
t values are located at the intersection of the df value and the selected ɑ/2 value.

33
Q

What are the confidence intervals for the t distribution?

A

xbar +/- tɑ/2,n-1 * (s/sqrt(n)

df = n-1

34
Q

Determining the sample size when estimating u.

A
It may be necessary to estimate the sample size when working on a project. 
In studies where u is being estimated, the size of the sample can be determined by using the z formula for sample means to solve for n.
z formula:
z = (xbar - u) / (o/sqrt(n))
Error of estimation (tolerable error):
E = xbar - u
Estimated sample size:
n = (z^2ɑ/2 * o^2) / E^2) 
= (zɑ/2 * o / E)^2
Estimated o = 1/4*range
35
Q

ERROR OF ESTIMATION

A

Difference between xbar and u.

xbar - u