module 1 - data collection Flashcards
define population and sample
population - whole set of items that are of interest
sample - a selection of observations taken from a subset of the population, used to estimate information from the population as a whole
what is a census
an observation or measure of every member of the population
what are the three methods of random sampling and how are they carried out?
- simple random - each member is allocated a unique number and a selection of these numbers is allocated at random
- systematic sampling - required elements are chosen at regular intervals from an ordered list, the first subject is chosen at random, then goes up the list in spaces of 5 etc.
- stratified sampling - the population is divided into mutually exclusive strata (eg males and females) and a random sample is taken from each.
in stratified sampling, what is the formula used to calculate the number of people we should sample from each stratum?
number in stratum/ number in population multiplied by overall sample size.
what are the advantages and disadvantages of simple random sampling
advantage - bias reduced, easy and cheap to implement, each sampling unit has a known and equal chance of selection
disadvantage - not suitable for large population sizes as time consuming and expensive. sampling frame is needed
advantages and disadvantages of systematic sampling
advantages - quick and simple to use. suitable for large samples and populations.
disadvantages - sampling frame needed, it can introduce bias if the sampling frame is not random
advantages and disadvantages of stratified sampling
advantages - sample accurately reflects the population structure. guarantees proportional representation of groups within the population.
disadvantages - population must be clearly classified into distinct strata. selection within each stratum suffers from the same disadvantages and simple random sampling. sample frame needed
what sampling techniques do not require a sample frame (non random) and how are they carried out
quota sampling - same as stratified sampling (set into groups which determines the proportion of the sample that should have that characteristic) however no sample frame is used. - eg if interviewers were to meet people, assess their group and allocate them to an appropriate quota, once the quotas are filled, the person is ignored and the sample frame is created.
opportunity sampling - taking a sample from people who are available at the time the study is carried out and who fit the criteria you are looking for. (could be the first 20 people you meet outside a shop)
advantages and disadvantages of quota sampling
allows a small sample to still represent the whole population. no sample frame required, quick and easy, allows for comparison between different groups within a population.
disadvantages - non-random sampling can introduce bias. population must be divided into groups (can be costly) non-responses are not recorded as such.
advantages and disadvantages of opportunity sampling
advantages - easy to carry out, inexpensive
disadvantages - unlikely to provide a representative sample, highly dependent on individual researcher.
list examples of continuous data
height of a person
mass
list examples of discrete data
results of a rolling die.
population of a country
number of books checked out at library
shoe size
what is qualitative/ categorical data
non-numerical values eg. colour, favourite food
what is quantitative data?
what are the subsets of this data?
numerical values.
discrete - only specific values eg. shoe size
continuous - can take any decimal value
what do you need to know about the large data set?
understanding the categories and sub-categories in the data set
understanding how values in the large data set are rounded
knowledge of trends in the data
knowledge of outliers and other anomalies in the data.