Vocab Flashcards
Chapter 1 - Collection of data.
Population
Everyone/everything involved in an investigation.
Census
An investigation with data taken from every member of a population.
Sample
An investigation with data taken from a select few of the population.
Bias
Anything that distorts the data.
Strata
A subgroup/subcategory within a sample.
Sampling frame
A list of all the items/people forming a population.
Sampling unit
One item from a sampling frame.
Observation
You record something happening.
Experiment
You record data from something you make happen.
Qualitative data
Describes certain qualities.
Quantitative data
Describes certain quantities, can be discrete or continuous.
Continuous data.
Data we can measure.
Discrete data.
Data we can count.
Primary data.
Collected by the user.
Secondary data.
You obtain the data from somebody else.
Questionnaire.
A set of questions used to obtain data, which respondents complete, can be anonymous.
Interview/Survey.
Data collection methods. Ask people their opinions, can be anonymous.
Pilot survey.
Testing a questionnaire on a small group of people first.
-identifies likely responses
-checks response rate
-see if questions are understood
-checks how long it will take
-unexpected outcomes(refine hypothesis/change something)
-problems easier and less costly to fix before full study
-check methods of distribution/collection work
-estimate time/costs of full study
Open questions.
No suggested answers, differently worded answers can make data analysis difficult.
Closed questions.
Suggested answers to choose from, opinion scales where people tend to answer in the middle as they do not wish to be extreme.
Capture recapture
A population estimate.
Judgement sampling.
Use judgement to select a sample representative of the population.
Opportunity sampling.
Use available people/objects at the time.
Systematic sampling.
Choose a starting point from your sampling frame at random, then choose items at regular intervals. (e.g. sampling frame of 1st 32, use RNG to pick number in 1st 32, then go up sample in intervals of 32s asking every person selected.)
Random sampling.
Everyone in the population has an equal chance of being selected (unbiased).
Quota sampling.
Group by characteristics, and interview a number from each group
Cluster sampling.
Data naturally splits. List of clusters = sampling frame. Randomly select clusters to form sample.
Stratified sampling.
Intentionally different proportion of people asked from each strata, depending on size. (e.g. 60/1000 x 250 =15 year 7s in sample).
Random response method.
For sensitive questions which people are likely to answer dishonestly (e.g. flipping coins, if heads, tick yes, if tails, answer honestly.)
Primary data advantages
gather data that directly relates to hypothesis
you know reliability
primary data disadvantages
expensive
time consuming
difficult/impossible
secondary data advantages
easier to get hold of
can gather data quickly and cheaply
large data sets
secondary data disadvantages
wrong format/rounded
difficult to find data that matches your hypothesis exactly
(out of date, no relevant data available)
don’t know accuracy, may be biased, unreliable
census pros
representative of entire pop.
unbiased
census cons
hard/impossible for big pops.
expensive
impractical
might be tricky to define entire pop/access all members
not an option when items being used up/damaged by investigation
sample pros
quicker
cheaper
more practical than a census
sample cons
less accurate
not fully representative
biased
variability between samples
random sampling pros
unbiased
(should be) representative
random sampling cons
not always practical/convenient-if pop. spread over large area, travel
impossible to list entire pop. or access everyone
stratified sampling pros
likely gives a representative sample if you have easy to define categories (e.g. gender)
can compare results from different groups
stratified sample cons
not useful when no obvious categories/hard to define
can be expensive
systematic sampling pros
unbiased sample
can be done by machine
systematic sample cons
nth item might coincide with a pattern (e.g. fault) so biased