Chapter 1 - Data Collection Flashcards
What is a population?
The whole set of items that are of interest (the thing being surveyed is the population)
What is a census?
A census observes or measures every member of a population
What is a sample?
A sample is a selection of observations taken from a subset of the population, which is used to find out information about the population as a whole
Census advantages:
- Gives a completely accurately result
- Representative of everyone (and smaller subgroups)
Census disadvantages:
- Time consuming
- Expensive
- Cannot be used when the testing process destroys the item
- Hard to process large quantities of data
Sample advantages:
- Less time consuming
- Less expensive
- Fewer people have to respond
- Less data to process
Sample disadvantages:
- The data may be less accurate
- The sample may not be large enough to represent small subgroups
- The results could be biased
What are individual units of a population known as?
Sampling units (i.e. a person in a larger survey is a sampling unit)
What is a sampling frame?
A list of individually numbered or named sampling units of a population
What does the size of a sample depend on?
The required accuracy and available resources
How is the validity of a sample affected by its size?
- Generally, the larger the sample, the more accurate it is (but you will require greater resources)
- If the population is varied, you need a larger sample than if the population were uniform
- Different samples can lead to different conclusions due to the natural variation in a population
Why do we random sample?
We randomly sample because it means every member of the population has an equal chance of being selected. The sample should therefore be representative of the population. It also helps to remove bias from the sample
What are the 3 methods of random sampling?
- Simple random sampling
- Stratified sampling
- Systematic sampling
Simple random sampling:
To carry out a simple random sample, you need a sampling frame, usually a list of people or things, Each person or thing is allocated a unique number and a selection of these numbers is chosen at random. Selections can be made using random number generators or lottery style sampling (e.g. pulled from a hat)
Stratified sampling:
In stratified sampling, the population is divided into mutually exclusive strata (distinct subgroups of the population e.g. males and females) and a random sample is taken from each. The number selected from each stratum is reflective of the proportion of that stratum within the population
Systematic sampling:
In systematic sampling, the required sampling units are selected from an ordered list, and made at regular, chosen intervals
The size of the interval depends upon the number in the population, as well as the number desired from the sample. Divide the population by the sample number and round down, then use this value as the difference between the ordered terms, after selecting a starting point less than this value
Advantages of simple random sampling:
- Free of bias
- Easy and cheap to implement for small populations and small samples
- Each sampling unit has a known and equal chance of selection
Disadvantages of simple random sampling:
- Not suitable when the population size or the sample size is large as it is potentially time consuming, disruptive and expensive
- A sampling frame is needed
- Could exclude minorities
Advantages of Systematic sampling:
- Simple and quick to use
- Suitable for large samples and large populations
- Cheap and easy
Disadvantages of Systematic sampling:
- A sampling frame is needed
- It can introduce bias if the sampling frame is not random
Advantages of Stratified sampling:
- Sample accurately reflects the population structure
- Guarantees proportional representation of groups within a population
- Small sample sizes required - saves time and money
Disadvantages of Stratified sampling:
- Population must be clearly classified into distinct strata
- Selection within each stratum suffers from the same disadvantages as simple random sampling
- The requirement of a sampling frame means it’s time consuming and expensive
What are the 2 types of non-random sampling?
- Quota Sampling
- Opportunity Sampling
Quota Sampling:
In quota sampling, the interviewer/researcher first determines the different characteristics of the populations that they wish to represent. These will be mutually exclusive in the same way that the strata for a stratified sample are
It is then determined how many people you wish to question from each group. (this can be determined in the same manner as a stratified sample)
As an interviewer, you would then meet members of the population, assess which strata they fall into, and then allocate them into the appropriate quota
Once you have met your quota for a group, you no longer include any further members into that group
You continue this process until your quota for each group is filled
Opportunity sampling:
Opportunity sampling consists of taking the sample from people who are available at the time of the study, and who fit the relevant criteria
For example, if you wish to find out the purchasing habits of shoppers from a particular store, you may choose to question those who are leaving the store and have made a purchase
Advantages of Quota sampling:
• Allows a small sample to still be representative of the population
• No sampling frame required
• Quick, easy and inexpensive
• Allows for easy comparison between different groups within a population
Disadvantages of Quota sampling:
• Non-random sampling can introduce bias
• Population must be divided into groups, which can be costly or inaccurate
• Increasing scope of study increases number of groups, which adds time and expense
• Non-responses are not recorded - leads to bias
Advantages of Opportunity sampling:
- Easy
- Cheap
- Quick
- No sampling frame required
Disadvantages of Opportunity sampling:
- Unlikely to provide a representative sample
- Highly dependent on the individual researcher
What is quantitative data?
Variables or data associated with numerical observations are called quantitative variables or quantitative data
For example, you can give a number to shoe size so shoe size is a quantitative variable
What is qualitative data?
Variables or data associated with non-numerical observations are called qualitative variables or qualitative data
For example, you can’t give a number to hair colour (blonde, red, brunette). Hair colour is a qualitative variable
What is continuous variable?
A variable that can take any value in a given range is a continuous variable
For example, time can take any value, e.g. 2 seconds, 2.1 seconds, 2.01 seconds etc.
What is a discrete variable?
A variable that can take only specific values in a given range is a discrete variable
For example, the number of girls in a family is a discrete variable as you can’t have 2.65 girls in a family
Grouped frequency tables:
When data is presented in a grouped frequency table, the specific data values are not shown. The groups are more commonly known as classes
• Class boundaries tell you the maximum and minimum values that belong in each class
• The midpoint is the average of the class boundaries
• The class width is the difference between the upper and lower class boundaries
LDS - Daily mean temperature:
Daily mean temperature in °C - this is the average of the hourly temperature readings during a 24-hour period
LDS - Daily total rainfall:
Daily total rainfall including solid precipitation such as snow and hail, which is melted before being included in any measurements - amounts less than 0.05 mm are recorded as ‘tr’ or ‘trace’
‘tr’ means 0 in an exam question
LDS - Daily total sunshine:
Daily total sunshine recorded to the nearest tenth of an hour
LDS - Daily mean wind direction and windspeed:
Daily mean wind direction and windspeed in knots, averaged over 24 hours from midnight to midnight. Mean wind directions are given as bearings and as cardinal (compass) directions. The data for mean windspeed is also categorised according to the Beaufort scale
LDS - Beaufort scale:
LDS - Daily maximum gust:
Daily maximum gust in knots - this is the highest instantaneous windspeed recorded. The direction from which the maximum gust was blowing is also recorded
LDS - Daily maximum relative humidity:
Daily maximum relative humidity, given as a percentage of air saturation with water vapour. Relative humidities above 95% give rise to misty and foggy conditions
LDS - Daily mean cloud cover:
Daily mean cloud cover measured in ‘oktas’ or eighths of the sky covered by cloud
LDS - Daily mean visibility:
Daily mean visibility measured in decametres (Dm). This is the greatest horizontal distance at which an object can be seen in daylight
LDS - Daily mean pressure:
Daily mean pressure measured in hectopascals (hPa)
How are missing values from the Large Data Set represented?
n/a or ‘not available’