Data Collection Flashcards
Methods for data collection, sampling strategies
What is probability theory?
The study of the mathematical rules that govern random events.
What is an event?
A possible outcome of an experiment or observation.
What is descriptive statistics?
The use of statstics to summarise a set of known data in a clear and consise way.
What is statstical inference?
The methods and practice of forming judgements about the parameters of a population, usually on the basis of random sampling.
What is statistics used for?
- designing experiments and other data collection
- summarising info to aid understanding
- drawing conclusions from data
- estimating the present or predicting the future
What are the general steps used in stats?
- pose a question, what are the objectives of the study?
- decide what to measure and how to measure it
- collect or generate data
- explore data, check for oddities
- calculate formal statistical summaries and carry out tests to answer posed question
- explore sensitivity of analysis to assumptions
- communicate findings
What are the three general methods for collecting or generating data?
- polls and surveys
- experiments
- observational studies
What is a target population?
A collection of objects/individuals we want to learn something about.
What is a sample?
A subset of the target population.
What is a census?
When information from the whole target population is obtained.
What is a poll or survey?
The process of collecting data from a sample in order to obtain info about the whole population.
What are the cons of censuses?
- difficult to make sure everyone participates
- more expensive than surveys
- takes longer
- more practical to take survey
What is a sampling error?
A single random sample will not give an exact “right answer”; it is unavoidable without taking a census.
How do we choose a sample?
- each member of the population has an equal chance of being chosen
- avoids bias
- allows for the calculation of the likely size of sampling errors
What is a variable?
Some characteristic of each individual in the population.
What is a parameter?
A numerical summary of a variable for a population.
What is a statistic?
A numerical summary of a variable for the sample.
What is precision?
The value of the statistic is similar in all samples.
What is a bias?
It implies that the sample statistic tends to differ from the population parameter in a consistant way (there is a systematic error).
Describe two simple sampling strategies.
- number every member of the population, draw numbers from a hat
- calculate a fixed periodic interval:
k = N/n
where k is the inverval between successive samples, n is the sample size and N is the population size
pick a random value between 1 and k (call this q)
sample the qth individual, then the (q + k)th, then the (q + 2k)th, etc
What are the benefits of systematic sampling over random sampling?
- easier; only one random number has to be drawn
- BUT if population contains some periodic variation which lines up with k, sample will be biased
What is stratified sampling?
The population is divided into different categories (or strata) and different samples are taken from each stratum.
What is selection bias?
Introduced when the selected sample is not representive of the population of interest.
What is self-selection bias?
Occurs when people decided whether they wish to participate in the survey or not.