Data Collection Flashcards
Methods for data collection, sampling strategies
What is probability theory?
The study of the mathematical rules that govern random events.
What is an event?
A possible outcome of an experiment or observation.
What is descriptive statistics?
The use of statstics to summarise a set of known data in a clear and consise way.
What is statstical inference?
The methods and practice of forming judgements about the parameters of a population, usually on the basis of random sampling.
What is statistics used for?
- designing experiments and other data collection
- summarising info to aid understanding
- drawing conclusions from data
- estimating the present or predicting the future
What are the general steps used in stats?
- pose a question, what are the objectives of the study?
- decide what to measure and how to measure it
- collect or generate data
- explore data, check for oddities
- calculate formal statistical summaries and carry out tests to answer posed question
- explore sensitivity of analysis to assumptions
- communicate findings
What are the three general methods for collecting or generating data?
- polls and surveys
- experiments
- observational studies
What is a target population?
A collection of objects/individuals we want to learn something about.
What is a sample?
A subset of the target population.
What is a census?
When information from the whole target population is obtained.
What is a poll or survey?
The process of collecting data from a sample in order to obtain info about the whole population.
What are the cons of censuses?
- difficult to make sure everyone participates
- more expensive than surveys
- takes longer
- more practical to take survey
What is a sampling error?
A single random sample will not give an exact “right answer”; it is unavoidable without taking a census.
How do we choose a sample?
- each member of the population has an equal chance of being chosen
- avoids bias
- allows for the calculation of the likely size of sampling errors
What is a variable?
Some characteristic of each individual in the population.
What is a parameter?
A numerical summary of a variable for a population.
What is a statistic?
A numerical summary of a variable for the sample.
What is precision?
The value of the statistic is similar in all samples.
What is a bias?
It implies that the sample statistic tends to differ from the population parameter in a consistant way (there is a systematic error).
Describe two simple sampling strategies.
- number every member of the population, draw numbers from a hat
- calculate a fixed periodic interval:
k = N/n
where k is the inverval between successive samples, n is the sample size and N is the population size
pick a random value between 1 and k (call this q)
sample the qth individual, then the (q + k)th, then the (q + 2k)th, etc
What are the benefits of systematic sampling over random sampling?
- easier; only one random number has to be drawn
- BUT if population contains some periodic variation which lines up with k, sample will be biased
What is stratified sampling?
The population is divided into different categories (or strata) and different samples are taken from each stratum.
What is selection bias?
Introduced when the selected sample is not representive of the population of interest.
What is self-selection bias?
Occurs when people decided whether they wish to participate in the survey or not.
What is interviewer effects?
Different interviewers asking the same question can obtain different results.
What is non-response bias?
When target population don’t respond and non-respondents tend to behave differently to repondents.
What is question effects?
Subtle variation in question wording can have an effect on the responses.
What is survey format? (As a reason for bias)
Results may be affected by factors such as question order or if the survey is conducted by mail, by phone or in person.
What are behavioural considerations?
People tend to answer questions in a way they consider to be socially acceptable.
What does transferring findings mean?
When results from one population are transferred to another.
What is an experiment?
Used to try and discover whether a “treatment” or “condition” has an effect on individuals.
What is a randomised experiment?
Treatments are randomly allocated to experimental units.
What are subjects?
Experimental units that are human.
What does partitioned mean? (Or, what are blocked experiments)
Individuals are assigned to groups (eg male/female), this reduces natural variability.
What is a placebo?
A drug with no known effects.
What is double blinding?
When the doctor doesn’t know which treatment they are offering, neither does the patient, stops doctors having differences in behaviour.
What is a control group?
A group of units not given any form of treatment.
How many times should experiments be repeated?
At least twice the number of treatments.
What is an observational study?
No treatment given to subjects, instead naturally occuring events are recorded, used sometimes when its difficult to carry out a randomised experiment.
What are the two types of observational study?
- prospective (looking forward, choose samples now and follow up in the future)
- retrospective (looking backward, choose samples and examine previous data)
special case (case-control): seperate samples into cases and controls, look back and compare histories
Can observational studies provide conclusive evidence for causation? What about a randomised experiment?
obs study - no
exp - possible to argue causation
What is a confounding variable?
Some factor that’s not accounted for, introduces a difference in treatment groups, NOT due to treatment.
Name two types of sampling for an area (of land for example).
- plot sampling (completely random or systematic grid (which in turn can be random or systematic))
- strip sampling (simple random or systematic sampling with random start)
What is a response variable?
Variable/info we would like to describe or predict.
What is an explanatory variable?
Covariates/info we expect might help us explain or predict the response variable.