UNIT 1: The collection of data Flashcards
Primary Data
Primary data is collected by, or for, the person who is going to use it.
Secondary Data
Secondary data has been collected by someone else
Advantages Primary Data
- Accuracy is known
- Collection is known
- Can find answers to specific questions
Disadvantages Primary Data
- Time consuming
- Can be expensive
Advantages Secondary Data
- Cheap
- Quick
- Can be reliable from Office of National Statistics, or sporting results pages
Disadvantages Secondary Data
- Don’t know method of collection
- May not be able to find answers to specific questions
- Websites may be unreliable
- May be out of date
Qualitative Data
Non numerical observation
Quantitative Data
Numerical observation
Continuous Data
Can take any numerical value on a scale
Discrete Data
Can only take particular values (eg. Shoe size, number of words typed per minute)
Ordinal Data
Data from a numerical rating scale
Categorical Data
Data which can be sorted into non overlapping categories/class intervals
Bivariate Data
Data which involves pairs of related data (each pair of data points refer to one item)
Multivariate Data
Data which involves three or more related data values. Each set of data values refers to one item
Population
Everything or everybody that could possibly be involved in an investigation
Census
A survey or investigation with data taken from every member of the population
Sample
Contains information about part of the population
Census: Advantages
- Accurate
- Takes the whole population into account
Census:
Disadvantages
- Time consuming
- Expensive
- Difficult to ensure the whole population is used
- Lots of data to handle
Sample: Advantages
- Cheaper than a census
- Quicker than a census
- Less data to handle
Sample: Disadvantages
-Not completely representative
Bias
- If a sample is not representative of the population it is biased.
- It could be selected unfairly.
- It could be that the sample size is too small
Sampling Frame
A LIST OF all the people/things that we are selecting our sample from
Sampling Unit
The people/things that are being sampled
Electoral Roll
A list of people who are eligible to vote in the UK. The easiest way to get a list of adults in a geographical area
Petersen Capture-Recapture Formula
M/N=m/n
M=total tagged at start
N=population (unknown)
m=number tagged in sample
n=sample size
Petersen Capture-Recapture Assumptions
- Population is closed (state specifics in context)
- Tagging doesn’t affect survival rate
- Tags don’t get lost/removed and are easily recognisable
- The sample size is large enough to represent the population.
- The probability of being caught is equal for all individuals in the population
Random Sampling
In a random sample every member of the population has equal chance of being selected
Random Sampling Advantages
- Provided it is large the sample is likely to be representative of the population.
- Choice of members of sample is unbiased
Random Sampling Disadvantages
- Needs a full list of the whole population
- Needs a large sample
Methods of random sampling
Always number your sampling frame
(1) Pull numbers from hat
(2) Use RanInt function on your calculator
(3) Use a random number table: select starting point on table randomly
Opportunity Sampling
Use the people or things that are available at the time
Quota Sampling
Group population by characteristics (eg gender/age) and then ask a specific number from each quota
Judgement sampling
Use your judgement to select a sample that you think is representative
Cluster Sampling
Population forms in natural groups. Your sampling frame is the list of clusters, random select clusters to sample
Systematic Sampling
Pick a random starting point and then select every eg 10th item on your sampling frame. Need to number sampling frame
What four things to comment on when non random sampling
- Bias
- Cost
- Time
- Sample size
Stratified sampling calculation
(Total in stratum / Total population) × sample size
Pilot Survey - reasons
- To check the response rate
- To check the questions make sense
- To check you collect the data you are expecting
Questionnaire key points
- Open questions - free written answers
- Closed questions - multi choice or opinion scale (Must include “other” or “don’t know”)
- Always include a timeframe (eg: yesterday, last week, last year etc)
- Avoid questions where respondents would be tempted to lie.
- Don’t ask leading questions like “Don’t you agree….?”
Interview: Advantages
- High response rate
- Can explain the questions
- Can explain their answers
- Can put people at ease
Interview: Disadvantages
- Less honest for personal questions/trying to impress the interviewer
- Time-consuming, therefore expensive
- Bias in who the interviewer speaks to
- Sample size is small
Anonymous Questionnaire: Advantages
- Honest for personal questions
- Quick, cheap
- No interviewer bias
- Sample size can be a large as you like
Anonymous Questionnaire: Disadvantages
- Low response rate
- May not understand the questions
- May not understand the answers
Reliable data
Data which can be replicated
Valid data
Data which measures what you want it to measure
Cleaning Data
The process of identifying gaps, anomalies or errors in the data. Usually done on Excel with “sort” and “find” function
Extraneous variables
A variable you are not interested in that could affect your result
Control group
Select two groups randomly. Give the control group no treatment. Give the test group treatment. Compare the results. (Often used for medical trials)
Matched pairs
Two groups where each individual in one group is paired with an individual in the second group. They should have everything in common except the factor being studied.
Hypothesis
An idea that can be tested by collecting and analysing data.