1 Collecting Data Flashcards
raw
data before it is sorted
eg data from a survey
quantitative
numerical
eg height
qualitative
non numerical
eg colour
continuous
can take any value on a scale
eg weight, length
discrete
can only take particular values on a scale
eg shoe size, no. siblings
categorical
can be sorted into non overlapping/ranked categories
eg gender
ordinal
can be written in order / be given a numerical ranking scale
eg test scores
bivariate
involves pairs of related date
eg working hours and pay
multivariate
3+ sets of data
eg plants: colour, leaf size and height
do intervals (usually) need to be equal widths
no
primary
collected by/for the user
secondary
collected by/for someone other than the current user
eg websites, newspapers, research articles, databases, census returns
primary vs secondary (adv and disadv)
census
survey/investagation with data from EVERY MEMBER of a population
sampling units
people or items that are to be sampled
sampling frame
a list of all the sampling units
eg of population, SU and SF
(number of hours spent on hw is more in Y7 and Y9
P: all Y7 and Y9 students
SU: students in Y7&9
SF: list of Y7&9 students
petersen capture recapture formula
m/n = M/N
no. marked in recapture/number in recapture = original number marked/total population
assumptions made in capture recapture method
- P(caught) is same for all individuals
- marks are not lost and always recognisable
- sample size is large enough to be representative of population
- population has not changed (no members have entered or left, no births or deaths between release and recapture)
- marked individuals have mixed with rest of population between release and recapture
random sample
every sampling unit (member of population) has an equal chance of being included
pros and cons of random sample
P: more likely to be representative of population if sample size is large
- choice of members of sample is unbiased
D: need a full list of population
- need a large sample size
problems of random sample
random numbers may be out of range
random numbers may be repeated
how to generate random sample
1) RNG (eg calc) / names from a hat / random number table
2) ignore numbers out of range and duplicates
3) do this X times
judgement sampling
using your judgement to choose a sample which is representative of the population
opportunity sampling
using the people or items available at the time
cluster sampling
use natural groups which occur in data
list of clusters = sampling frame
and some clusters are randomly selected to make up the sample (eg geographical areas)
systematic sampling
choose a start point in the sampling frame at random and choose items at regular intervals
quota sampling
group the population by chosen characteristics and take a quota from each group (eg age/gender)
how to decide if a sampling method is suitable
- will it be biased
- will SS be sensible
- how quick and easy is method
- how expensive
stratified sample
contains members of each stratum in proportion to the size of that stratum.
sample from each stratum is selected randomly
describe how to do a stratified sample
1) calculations
THEN -order each group into order
- assign each a random no.
- choose the relevant no. of people to survey
data collection sheet
table/tally chart for recording your results
direct observation
recording behaviour patterns systematically as you observe them
independent variable
explanatory variable
what you control (but change)
dependent variable
response variable
affected based on your changes to the explanatory variable
extraneous variable
variable that you are not interested in but that could affect your results
laboratory experiment
field experiment
natural experiment
LabE pros and cons
FieldE pros and cons
NatE pros and cons
simulation
can be used to model random real life evens to predict what could actually happen.
easier and cheaper than collecting and analysing real data
an experiment is valid/reliable if…
when replicating an experiment gives very similar data
questionnaire
set of questions designed to obtain data
open vs closed question
open: no suggested answers
closed: answers to choose from
con of open questions
every respondent gives a different answer so is hard to summarise and analyse the answers
problem with opinion scales
most will answer somewhere near the middle
unlikely to indicate a strong opinion - do not want to seem extreme
what to do in questionnaires
- short questions, simple language
- no biased/leading questions
- intervals that don’t overlap
- options cover all possibilities (0/never/don’t know)
- include time frame
- avoid questions respondents are unlikely to answer honestly
interview pros and cons
anonymous questionnaire pros and cons
pilot survey
conducted on a small sample to test the design and methods of the survey
checks:
- respondents understand questions
- closed questions include all likely answer options
- questionnaire collects the information needed
random response method
how to answer estimate question about RRM
outliers/anomalous data
values that do not fit the pattern of the data
can be ignored if it is due to a measuring/recording error
cleaning data
- identifying and correcting/removing inaccurate/extreme values
- removing units or other symbols form data
- deciding what do to with missing values
control groups? and where are they often used
matched pair test
where two group of people are used to test theffects of a particular factor
each individual in a group is paired with an individual in the second group with similar characteristics barring the factors which is to be studied
pros and cons of matched pairs
P: can control for different factors
C: may have to test a large group at first to find enough matched pairs for a good test
who is often used in MPTs
identical twins- easier to see different results
disadvantage: limited supply of willing twins
hypothesis
an idea that can be tested by collecting and analysing data
designing investigations- what do you need to consider?
difference between field and natural experiments
a field experiment the researcher manipulates the independent variable (IV), while in a natural experiment the researcher does not