U1 - Collecting data Flashcards
Advantages of primary data (3)
Disadvantages of primary data (2)
ADV:
- Collection method is known
- Accuracy is known
- Can find answers to very specific questions
DISADV:
- Time-consuming to collect
- Expensive to collect
Advantages of secondary data (3)
Disadvantages of secondary data (5)
ADV:
- Easy to obtain
- Cheap to obtain
- Data from some organisations can be more reliable than data you collect yourself
DISADV:
- Method of collection is unknown
- Data might be out of date
- Data may contain mistakes
- Data may come from an unreliable source
- May be difficult to find answers to specific questions
Advantages of a census (3)
Disadvantages of a census (4)
ADV:
- Unbiased
- Accurate
- Takes the whole population into account therefore it’s representative
DISADV:
- Time-consuming
- Expensive
- Difficult to ensure the whole population is used
- Lots of data to handle
Advantages of a sample (3)
Disadvantages of a sample (2)
ADV:
- Cheaper than a census
- Less time-consuming than a census
- Less data to be considered than a census
DISADV:
- Not completely representative
- May be biased
Define sampling units
The people or items that are to be sampled
Define sampling frame
A list of all the sampling units
How do you carry out a simple random sample?
- Using the sampling frame, number each person from 01 to x.
- Then, use a random number generator to generate x numbers, ignoring any repeats.
- Identify what students these numbers correspond to - this is the data you should use.
Advantages of a simple random sample (3)
Disadvantages of a simple random sample (2)
ADV:
- Free of bias
- Sample is more likely to be representative of the population, provided it is a large sample
- Each sampling unit has an equal chance of selection
DISADV:
- Not suitable when the sample size is small
- A sampling frame is needed
How do you carry out a systematic sample?
- Using the sampling frame, number each person from 01 to x.
- Calculate a regular interval to use by dividing the population size by the sample size.
- Generate a random number from 0 to the interval to determine the starting point.
- Keep adding the interval to the starting point to select your sample.
Advantages of a systematic sample (2)
Disadvantages of a systematic sample (2)
ADV:
- Simple and quick to do
- Suitable for large samples and populations
DISADV:
- A sampling frame is needed
- Can introduce bias if the interval aligns with a pattern in the data
How do you carry out a stratified sample?
- Divide the population into categories that you’re stratifying by
- Calculate the number needed from each strata using the formula: (sample size/population size) x num in strata
- Use a random number generator to select the sample for each category
Advantages of a stratified sample (3)
Disadvantages of a stratified sample (1)
ADV:
- Sample accurately reflects population
- Guarantees proportional representation of groups within a population
- Minimises bias
DISADV:
- Population must be put into strata which can be costly or time consuming, especially if the population size is large
How do you carry out a quota sample?
- Group the population by characteristics such as age/gender
- Give each category a quote (number of members to sample)
- Collect data until the quotas are met in all categories
Advantages of a quota sample (4)
Disadvantages of a quota sample (4)
ADV:
- Allows a small sample to still be representative of the population
- No sampling frame is required
- Quick, easy, inexpensive
- Allows for easy comparison between different groups within a population
DISADV:
- Non-random therefore can introduce bias
- Population must be divided into groups which can be costly or time-consuming, especially if the population size is large
- Time-consuming and expensive
- Non-responses are not recorded
How do you carry out an opportunity sample?
- Choose members of the population that are the easiest to sample e.g. the first people to walk past
Advantages of opportunity sampling (2)
Disadvantages of opportunity sampling (2)
ADV:
- Easy to carry out
- Inexpensive
DISADV:
- Unlikely to provide a representative sample
- Highly dependent on individual researcher
Why would you want to group data?
Because it helps you to see the distribution of the data and spot patterns more easily
Describe what grouped discrete data would look like.
Classes with non-overlapping categories like 11-20, 21-30, etc
Disadvantages of grouped data (2)
- If too many or too few class intervals are selected, trends in the data can be obscured
- Individual data values are not known so you can only calculate estimates of the mean, mode and median - therefore its less accurate than raw data
Define continuous data
Data that can take any place on a continuous numerical scale e.g. length
Define discrete data
Data that can only take particular values on a continuous numerical scale e.g. shoe size
Define categorical data
Data that can be sorted into non-overlapping categories
Define ordinal data
Data that can be written in order or can be given a numerical rating scale
Define bivariate data
Data that involves pairs of related data
Define multivariate data
Data that involves sets of three or more related data values
What is self-selection sampling?
A type of non-probability sampling in which people choose to be part of the sample - e.g. they choose to complete a questionnaire or volunteer to take part in a study
Advantages of self-selection sampling (3)
- Requires little time or effort in finding sample members (because they contact you)
- People who have volunteered are more likely to respond
- It could be the only way to get people to take part in a study, or to find members of a population
Disadvantage of self-selection sampling (1)
- There can easily be trends within the respondents, such as people having strong opinions, which would lead to bias
Describe what grouped continuous data would look like.
Classes with non-overlapping categories and class intervals with no gaps (such as 50-59, 60-69 - 59.3 wouldn't be able to be shown). E.g. 50 < t <= 60
Define population
Everything or everybody that could possibly be involved in an investigation
Define a census
A census is a survey or investigation with data taken from every member of a population
Define bias
Systematic error
Define independent variable
A variable whos variation does not depend on that of another (x axis)
Define dependent variable
A variable whose value depends on that of another (y axis)
Define sample
A smaller number of items from the population
What’s the problem with gathering a bigger sample?
It’s more costly and time-consuming
What assumptions do you make when using the capture-recapture method? (4)
- The population hasn’t changed - no members have entered or left the population and there have been no births or deaths between the release and recapture times
- The probability of being caught is equal for all individuals
- Marks (or tags) have not come off
- The sample size is large enough to be representative of the population
Peterson capture-recapture method
- Capture a sample of the population
- Mark each item
- Put the items back into the population and ensure they’re thoroughly mixed
- Take a second sample and count how many are marked. The second sample should be taken long enough to ensure that the items are
Advantages of an interview (4)
Disadvantages of an interview (5)
ADV:
- Interviewer can explain questions
- Interviewer can put people at their ease when answering personal questions
- Respondent can explain answers
- High response rate - every person interviewed answers the questions
DISADV:
- Respondents may be less honest in an interview and less likely to answer personal questions
- Interviewing can take a long time, so can be expensive
- Sample size is smaller than for a questionnaire
- Interviewer bias - interviewer may interpret answers to suit their own opinions
- Respondents may try to impress the interviewer, or guess the answers the interviewer wants to hear
Advantages of an anonymous questionnaire (4)
Disadvantages of an anonymous questionnaire (3)
ADV:
- Respondents are more likely to be honest and more likely to answer personal questions
- Respondents can all complete the questionnaire at the same time, or in their own time, so can be quick and cheap
- Easy to send questionnaires to a large and representative sample
- No interviewer bias
DISADV:
- Respondent may not understand the questions
- Researcher may not understand the respondent’s answers
- Lower response rate - some people may not answer all the questions or return the questionnaire
What is a lab experiment?
An experiment conducted in a controlled environment (not necessarily a lab).
Advantages of a lab experiment (2)
Disadvantage of a lab experiment (1)
ADV:
- Easy to replicate because you can copy the experiment exactly
- You can control extraneous variables
DISADV:
- Test subjects may behave differently in test conditions than they do in real life
What is a field experiment?
An experiment carried out in test subjects’ everyday environment. The researcher sets up the situation and controls one or more variables.
Advantage of a field experiment (1)
Disadvantages of a field experiment (2)
ADV:
- Test subjects are more likely to reflect real life behaviour
DISADV:
- You can’t control extraneous variables
- Harder to replicate the experiment exactly
What is a natural experiment?
An experiment carried out in test subjects’ everyday environment, where researcher has no control over any variables.
Advantage of a natural experiment (1)
Disadvantages of a natural experiment (2)
ADV:
- Test subjects are more likely to reflect real life behaviour
DISADV:
- You can’t control any variables
- Harder to replicate the study exactly
What is an extraneous variable?
A variable that you are not interested in but could affect the results of your experiment
If replicating an experiment gives very similar data, what does this show?
That the data is likely to be valid and reliable
Disadvantage of using open questions in a questionnaire
Every respondent could give a different answer, so it can be difficult to summarise and analyse the answers
Disadvantage of opinion scales in a closed question questionnaire
Most people will answer somewhere near the middle. They are unlikely to indicate a strong opinion either way as they do not wish to seem extreme
Problems to look for in questionnaires (5)
- Boxes that do not cover all possibilities
- Boxes that cover one option more than once
- Biased questions that try to persuade you to agree
- Questions that people are unlikely to answer honestly
- Open questions that allow for personal opinions and do not have tick boxes where closed questions would be better
Things to do when designing a questionnaire (6)
- Keep questions short and use simple language
- Avoid biased or ‘leading’ questions that suggest a particular answer
- Give intervals that do not overlap
- Make sure options cover all possibilities, including ‘0’, ‘never’, ‘dont know’ or ‘other
- Include a time frame in questions e.g. in the last week
- Avoid questions that respondents are unlikely to answer honestly
What is a pilot survey?
A survey conducted on a small sample to test the design and methods of that survey. They’re good because you can check for any unforeseen problems
What is an outlier/anomaly?
A value that does not fit the pattern of the data
What is cleaning data? (3)
- Identifying and either correcting or removing inaccurate data values (caused by recording or other errors) or extreme values
- Removing units or other symbols from data
- Deciding what to do about missing data
What is a hypothesis?
An idea that can be tested by collecting and analysing data
What do you need to consider when testing a hypothesis (designing an investigation)? (8)
- How long it will take
- How much it will cost
- Ethical issues
- If people will answer sensitive questions
- If you can get the data locally, cheaply and in a short time frame
- How to select your population and sample
- How to deal with non-response
- How to deal with unexpected results
What is a questionnaire?
A set of questions designed to obtain data
What is a control group?
A group selected randomly from the population and is not subject to any factors under investigation
What causes outliers?
Human / machine / genuine error