Data collection Flashcards
population
A population in statistics means all the individuals/objects you are interested in for a particular investigation
eg. all 6-year-old girls in the UK, all items manufactured by a factory or all the trees in a public park.
census
A census observes or measures every member of a population.
sample
A sample is a selection of observations taken from a subset of the population which is used to find out information about the population as a whole. We then assume that the results for this sample are representative of the whole population.
Census pros
Should give a completely accurate result
Census cons
Time consuming and expensive
Can’t be used when testing process destroys the item
Hard to process a large amount of of data
sample pros
. Less time consuming and expensive than a census
. Fewer people have to respond
. Less data to process than in a census
sample cons
- The data may not be as accurate
- The sample may not be large enough to give information about small sub-groups of the population
sampling units
Individual units of a population
sampling frame
A list (or other representation) of the items available to be sampled
sampling fraction
The proportion of the available items that are actually sampled is called the sampling fraction. A 100% sample is called a census.
sampling error
The difference between an estimate of a parameter (e.g. mean) derived from sample data and its true value. To reduce the sampling error, you want your sample to be as representative of the parent population as you can make it.
bias
different types of people should be represented in the sample that is chosen. If the sample involves a more of certain group of people within the population, then it is said to be biased. To make good use of a sample we want to avoid bias.
representative sample
A sample that is typical of the whole population.
Random Sampling Techniques
- Simple random sampling
- Systematic sampling
- Stratified sampling
Simple random sampling
A simple random sample of size n is one where every possible sample of size n has an equal chance of being selected. This can be achieved by ensuring every member of a finite population has an equal chance of being selected as long as sampling is without replacement and selections are independent of each other.
two methods of choosing the numbers in Simple random sampling
- Using a random number generator (using a calculator, computer or random number table).
- Lottery sampling – eg. writing members of the sampling frame on tickets and drawing them out of a bag.
Simple random sampling - pros
- Free of bias
- Easy and cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection
Simple random sampling - cons
- Not suitable when the population size or the sample size is large
- A sampling frame is needed
Stratified sampling
In stratified sampling, the population is divided into mutually exclusive strata and a random sample is taken from each.
Divide population into sub-groups or strata: e.g. low income, middle income, high income, male, female
proportional stratified sampling
If we randomly sample from each group in proportion to the size of the group then it is called proportional stratified sampling.
stratified sampling - pros
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population
stratified sampling - cons
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from the same disadvantages as simple random sampling
Systematic sampling
In systematic sampling, the required elements are chosen at regular intervals from an ordered list.
From a list, choose a random starting item, then sample, for example, every 5th item.
To determine the interval required you divide the population by the required sample.
Systematic sampling - pros
- Simple and quick to use
- Suitable for large samples and large
- populations
Systematic sampling - cons
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random
Non-random sampling
- Quota sampling
- Opportunity sampling
Quota sampling
- In quota sampling, an interviewer or researcher selects a sample that reflects the characteristics of the whole population.
- The population is divided into groups according to a given characteristic. The size of each group determines the proportion of the sample that should have that characteristic.
- This is similar to a stratified sample but a specific number of people from each particular strata is sampled e.g. male/female, different age groups, etc.
use of Quota sampling
This method is often used for market research and is usually used by interviewers. The interviewer would meet people, assess their group and then, after interview, allocate them into the appropriate quota. This continues until all quotas have been filled. If you then begin interviewing someone for whom your quota is full, you just move on. The actual selection of the sample members is up to the interviewer, whereas stratified samples are done at random.
Quota sampling - cons
Non-random sampling can introduce bias
Population must be divided into groups, which can be costly or inaccurate
Increasing scope of study increases number of groups, which adds time and expense
Non-responses are not recorded as such
Quota sampling - pros
Allows a small sample to still be representative of the population
No sampling frame required
Quick, easy and inexpensive
Allows for easy comparison between different groups within a population
Opportunity sampling
Opportunity / convenience sampling consists of taking the sample from people who are available at the time the study is carried out and who fit the criteria you are looking for.
An example is interviewing passers-by on the street or the first 20 people you meet outside a supermarket on a Monday morning.
Opportunity sampling - pros
Easy to carry out
Inexpensive
Opportunity sampling - cons
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
Qualitative variables or qualitative data
Non-numerical data that come in classes or categories e.g. favourite colours, makes of car.
Quantitative variables or quantitative data
Numerical data for which the numbers are meaningful e.g. times to run a race, height.
Numerical data categories
Discrete
Continuous
Numerical data categories - Discrete
The set of values taken by the data can be listed
e.g. shoe sizes (), number of goals, number of children in a family
Numerical data categories - Continuous
Values can’t be listed because data can take any value in a particular range
e.g. heights, mass, time, etc.
Bivariate data
Two variables are assigned to each item e.g. height and weight, age and mileage of cars
Class boundaries
Class boundaries are the minimum and maximum values that belong in each class.
midpoint
The midpoint is the average of the class boundaries.
class width
The class width is the difference between the upper and lower class boundaries.