Data collection Terminology Flashcards
Population
Entire group of individuals we hope to learn about
Sample
Representative subset of the population, from whom we collect data
Parameter
A numerical value regarding the population.
For example, the mean income of all people in the U.S.
Statistic
A numerical value for sampled date.
These are used to estimate population parameters
Data collection
CENSUS
Complete enumeration of the entire population.
Attempt to contact every member of the population in order to collect data
Very cumbersome
Data Collection
SAMPLE SURVEY
A study that asks questions of sample drawn from the population in hope of learning about the entire population. We study part of a population in order to make inferences about the population
Sampling frame
List of individuals from whom the sample is drawn
Methods of Sampling.
VOLUNTARY RESPONSE SAMPLE
Allow people to call in or respond to a poll by logging in to their computer
Methods of sampling
CONVENIENCE SAMPLING
Collect data from only those who are in a convenient location or who are easy to reach
Methods of Sampling
PROBABILITY SAMPLING
Uses randomization in order to eliminate some of the Biases of voluntary and conveniene sampling
Valid
Types of probability samples
SIMPLE RANDOM SAMPLE
Each person has the same chance of being selected
All possible combinations of n have the same chance of being selected
Types of probability samples
STRATIFIED RANDOM SAMPLE
Population is divided into strata(subgroups) which are more homogeneous. Then choose an SRS from each strata.
For example, high school students are first divided into 4 strata (freshmen, sophs, juniors, seniors) then an SRS is taken from each.
(differs from SRS in that not all possible combinations of size n can be used for the sample)
Types of probability samples
SYTEMATIC RANDOM SAMPLE
First person selected randomly and then followed by a system.
For example if we want a sample of 10 out of pop. of 400,
Select random number from 1-40, say 32. Then add intervals of 40… so we would use 32, 72, 112, 152, 192,…
Types of probability samples
CLUSTER RANDOM SAMPLE
A sampling design in which entire groups or clusters are chosen at random. Each cluster should be heterogeneous, that is, include a range if people representing the various segments of the population
Types of probability samples
MULTISTAGE RANDOM SAMPLE
Sample is chosen in a series of stages. This type of sample is used extensively in national polls.
Example: Start with SRS of the counties in U.S., from set of counties, take srs of towns, from set of towns take srs of neighborhoods, from which take srs of households…
Statistucal methods are adjusted for different types of sampling
Biases
Sytematic error that favors a particular segment of the population or tends to encourage only certain outcomes in the data. These can be due to a poor methid if sampling, or even if the sampling is good, the data collection may have problems.
VOLUNTARY RESPONSE BIAS
Bias introduced when individuals can choose whether or not to participate in the sample. Samples based on voluntary response are always invalid
NONRESPONSE BIAS
people refuse to respond or can’t be reached. Those who do respond are likely not to represent the whole sample. Voluntary response bias is a type of nonresponse bias, but nonresponse can also occur because people are unavailable.
UNDERCOVERAGE BIAS
A group of people are underrepresented, such as people without phones or people not home during the day (related to selection bias)
RESPONSE BIAS
Anything in a survey that influences responses. Peple can respond in a certain way due to wirding if questions, order of choices, appearance of the interviewer, or dishonesty.
SAMPLING ERROR
NAtural VAriation between samples. It is Always present and can be described using probability. It is generally smaller when the sample size is larger.
It is NOT an indication of something done incorrectly or of bias - it’s just chance variation.