Statistics: Chapter 1 - Data Collection Flashcards
What is a Population
A population consists of all the items we are interested in.
(Note, there aren’t always people)
What is a Sample
A sample is a subset of items chosen from a population.
What is a Sampling Unit
Each individual item in the population that can be sampled is known as a sampling unit.
What is a Sampling Frame
Often sampling units of a population are individually named or numbered to form a list called the sampling frame.
Advantages and Disadvantages of a Census
Advantages:
- Should give completely accurate result.
Disadvantages:
- Time consuming and expensive
- Can not be used when testing involves destruction
- Large volume of data to process
Advantages and Disadvantages of a Sample
Advantages:
- Less expensive
- Less time consuming
- Less data to process
Disadvantages:
- Data may not be accurate
- Sample mag not be large enough to represent small sun-groups of the population
What is Random Sampling
Where each thing in your sample frame has an equal chance of being chosen.
Method of how to do Simple Random Sampling
To carry out a simple random sample, size n, from a population size N, firstly you need a sampling frame.
Each item is assigned a different number from 1 to N. Use a random number generator to select ‘n’ unique numbers or ‘lottery sampling’ (names in a hat).
Choose the items corresponding to these numbers to form the sample.
Advantages of Simple Random Sampling
- Bias free
- Easy and cheap to implement for small populations and samples
- Each sampling unit has a known equal chance of being selected
Disadvantages of Simple Random Sampling
- Not suitable when population size is large
- Sample may not accurately reflect the population
- A sampling frame is needed
Method of how to do Systematic Sampling
In systematic sampling, the required elements are chosen at regular intervals from an ordered list.
To carry out a systematic sample, size n, from a population, size N, you need a sampling frame.
Each item is assigned a different number from 1 to N. Starting at a random between 1 and k, take every k^th elements to form the sample k = pop size(N) / samp size (n)
Advantages of Systematic Sampling
- Simple and quick to use
- Suitable for large samples and populations
Disadvantages of Systematic Sampling
- Can introduce bias if sampling frame is small and not random as patterns can be picked up in the data
- A sampling frame is needed
Method of how to do Stratified Sampling
Population divided into groups (strata) and a simple random sample carried out in each group.
To carry out a stratified sample, size n, from population size N, you need a sampling frame and distinct strata. The same proportion n/N is to be sampled from each strata.
Within each strata, each item is assigned a different number and a random number generator is used to select the number of unique numbers required.
Choose the items corresponding to these numbers to form the sample.
Advantages of Stratified Sampling
- Sample accurately reflects population structure
- Guarantees proportional representation of groups within population
Disadvantages of Stratified Sampling
- Sampling frame is needed and population must be clearly classified into distinct data
- Selection within each stratum suffers from same disadvantages as simple random sampling.
Example Question:
There are 64 girls and 56 boys in a school. Explain briefly how you could take a random sample of 15 pupils using a random sample. (3 marks)
Using the school register as a sampling frame, assign each student a number between 1 to 120
Use a random number generator to select 15 unique numbers
Take the 15 students that correspond to those numbers as your sample
Example Question:
A school has 15 classes and a sixth form. In each class there are 30 students. In the sixth form there are 150 students. There are equal numbers of boys and girls in the sixth form. The head teacher wishes to obtain the opinions if the students about school uniforms. Explain how the head teacher would take a stratified sample of size 40. (7 marks)
Population = (15x30) + 150 = 600
Sixth Form:
150/600 = 1/4
1/4 x 40 = 10 sixth formers
10/2 = 5
∴ 5 sixth form boys, 5 sixth form girls
Using the sixth form register as a sampling frame, assign the males a number from 1 to 75 and the females a number from 1 to 75. Then use a random number generator to select 5 unique males and 5 unique females to from part of the sample.
Rest of School/Classes:
30 students
30/15 = 2
∴ 2 students per class (1 male, 1 female)
Example Question:
A telephone directory contains 50,000 names. A researcher wishes to select a systematic sample of 100 names from the directory. Explain in detail how the researcher should obtain a sample.
Using the telephone directory as a sampling frame, assign each name a number from 1 to 50,000.
50,000/100 = 500
Use a random number generator to select a number between 1 and 500, then starting with that name chosen every 500th name after that to form a sample of 100.
Method of how to do Quote Sampling
Population divided into groups according to characteristics.
A quote of items/people in each group is set to try and reflect the group’s proportion in the whole population (quotas are calculated in the same way as stratified sampling).
Interviewer selects the actual sampling units until the quotas are reached.
Once a quotas is full, ignore subsequent sampling units that also meet the characteristic.
Advantages of Quote Sampling
- Allows a small sample to still be representative of the population
- No sampling frame required
- Relatively quick, easy, inexpensive
Disadvantages of Quota Sampling
- Non-random sampling can introduce bias
- Population must be divided into groups, which can be costly or inaccurate
- Can depend on knowledge/expertise of researcher.
Method of how to do Opportunity/Convenience Sampling
Sample taken from people who are available at the time of study, who meet criteria.
Advantages of Opportunity/Convenience Sampling
- Easy to carry out
- Inexpensive
Disadvantages of Opportunity/Convenience Sampling
- Unlikely to provide a representative sample
- Highly dependant on individual researcher
What are Qualitative/Categorical Values and what are Quantitative Values
Qualitative/Categorical:
Non-numerical values, e.g colour
Quantitative
Numerical Values
What are the 2 types of Quantitative Data
- Discrete
- Continuous
What is Discrete Data
Data that can only take specific values, e.g shoe size, number of children.
What is Continuous Data
Data that can take any decimal value, e.g height, weight.
What assumption is made when finding the midpoint in an interval of a frequency table, for example
The use of the midpoint assumes values are evenly distributed throughout the interval.
Example Question:
A lake contains 3 species of fish. There are estimated to be 1400 trout, 600 bass, and 450 pike in the lake. A survey of the health of the fish in the lake is carried out and a sample of 30 fish is chosen.
a) Give a reason why stratified random sampling cannot be used
b) State an appropriate sampling method for the survey
c) Give one advantage and one disadvantage of this sampling method
d) explain how this sampling method could be used to select the sample of 30 fish. You must show your working.
a) There is no sampling frame (it is impossible to obtain one).
b) Quota Sampling
c) (Adv) - Sample can be obtained quickly
(Dis) - Surveyor may not be able to identify fish easily ( a level of expertise is needed)
d)
Trout - 1400/2450 x 30 = 17.1 (17)
Bass - 600/2450 x 30 = 7.35 (7)
Pike - 450/2450 x 30 = 5.51 (6)
Fish are caught from the lake until the quota of 17 trout, 7 bass, and 6 pike are reached.
If a fish is caught and the species quota is full, then this is ignored