1 Collecting Data Flashcards
raw
data before it is sorted
eg data from a survey
quantitative
numerical
eg height
qualitative
non numerical
eg colour
continuous
can take any value on a scale
eg weight, length
discrete
can only take particular values on a scale
eg shoe size, no. siblings
categorical
can be sorted into non overlapping/ranked categories
eg gender
ordinal
can be written in order / be given a numerical ranking scale
eg test scores
bivariate
involves pairs of related date
eg working hours and pay
multivariate
3+ sets of data
eg plants: colour, leaf size and height
do intervals (usually) need to be equal widths
no
primary
collected by/for the user
secondary
collected by/for someone other than the current user
eg websites, newspapers, research articles, databases, census returns
primary vs secondary (adv and disadv)
census
survey/investagation with data from EVERY MEMBER of a population
sampling units
people or items that are to be sampled
sampling frame
a list of all the sampling units
eg of population, SU and SF
(number of hours spent on hw is more in Y7 and Y9
P: all Y7 and Y9 students
SU: students in Y7&9
SF: list of Y7&9 students
petersen capture recapture formula
m/n = M/N
no. marked in recapture/number in recapture = original number marked/total population
assumptions made in capture recapture method
- P(caught) is same for all individuals
- marks are not lost and always recognisable
- sample size is large enough to be representative of population
- population has not changed (no members have entered or left, no births or deaths between release and recapture)
- marked individuals have mixed with rest of population between release and recapture
random sample
every sampling unit (member of population) has an equal chance of being included
pros and cons of random sample
P: more likely to be representative of population if sample size is large
- choice of members of sample is unbiased
D: need a full list of population
- need a large sample size
problems of random sample
random numbers may be out of range
random numbers may be repeated
how to generate random sample
1) RNG (eg calc) / names from a hat / random number table
2) ignore numbers out of range and duplicates
3) do this X times
judgement sampling
using your judgement to choose a sample which is representative of the population
opportunity sampling
using the people or items available at the time