Chapter 7 Sampling Flashcards
Population
The pool of units from which a statistical sample is drawn for a study. For example, if you want to examine Lund University students’ opinions on Allemansrätten, then the population = all the 40,000 students admitted to Lund.
Sample
A subset of the population. The number of units that you will actually examine / apply some statistical manipulation to. The sample will often NOT equal the whole Population, because of cost- and time-constraints. To be able to generalize findings in the sample to the population, it is important that the sample has been unbiasedly picked. It may be picked base on either a probability or non-probability approach.
Representative Sample
A sample that reflects the population accurately, so that the results gathered from the sample can be generalized to the whole population.
Probability Sample
Probability sampling is a technique used to select a sample of individuals from a larger population in a way that each member of the population has a known and equal chance of being selected for the sample. In other words, probability sampling ensures that each individual has an equal opportunity to be selected, and the chances of being selected are determined by the laws of probability.
Non-probability Sample
A sample that has not been picked using a random selection method. Implies unequal probability of selection of units across the sample; some units are more likely to be selected than others. Impairs the generalizability of the findings in the sample towards the population, and thus there is a higher risk that a non-probability sample won’t be representative.
Examples:
- Convenience Sampling
- Quota Sampling
- Snowball Sampling
Sampling Error
Sampling error is a term used in statistics that refers to the difference between the results obtained from a sample and the results that would have been obtained if the entire population had been sampled.
Despite using a random and representative sample, there may still be some degree of error or variation due to chance. Sampling error is a natural part of statistical analysis and is usually quantified to determine the accuracy and reliability of the results.
Non-response
A source of non-sampling error, particularly likely to happen when individuals are being sampled. Occurs whenever some members of the sample refuse to “cooperate” (e.g. you refuse to answer a survey sent to you via mail).
Non-sampling error
Non-sampling errors refer to errors in data collection or analysis that are not related to the sampling process itself. These errors can occur at any stage of the research process, from designing the survey instrument to data entry and analysis. Non-sampling errors can be caused by a variety of factors, including interviewer bias, respondent bias, measurement errors, and data processing errors. Unlike sampling errors, non-sampling errors are not related to the size of the sample and can affect both small and large samples. These errors can have a significant impact on the accuracy and validity of research findings, and researchers need to take steps to minimize them as much as possible.
Census
The total of an entire population. If data is collected in relation to all units in a population, the data is called “census data”.
Biased Sample
A sample that has been picked in a manner where the probability of picking a certain unit of the population has been unequal to the probability of picking another. Resulting in skewed results and weak generalizability. I.e. a NON-representative sample
Sources of Bias
3 main sources of bias
- Using a non-probability sampling method => Possibility that human judgement will affect the selection process, making the probability of picking some units of the population unequal to the probability of picking some others.
- Inadequate / Inaccurate sampling frame. (difference from 1????)
- If there are considerable amount of non-responses from the sample.
Why problem? Because those who agree to participate might differ in various ways from those who do NOT agree to participate in the survey.
Types of Probability Sampling
(remember: Probability Sampling is good; generally makes the sample representative aka generalizable)
- Simple Random Sample = most basic form of probability sampling. Here, each unit of the population has an equal probability of being selected into the sample. E.g. s = 450, P = 9000. If Simple Random Sample is applied, each unit should have a 450/9000 = 5% chance of being picked into the sample.
- Systematic Sample: A systematic sample is a type of sample where you choose units from a sampling frame by selecting a random starting point and then picking every nth unit until you have the desired sample size.
3.Stratified random sample (dividing population into subgroups (strata) based and organized upon shared characteristics) and then randomly select units from each subgroup.
- Multistage Cluster Sampling (sample from population, using smaller and smaller samples each time) often used when there is a large geographic spread. E.g. first sample companies, and then sample employees from those companies. A probability sampling method would need to be employed at each stage. E.g. we might randomly sample ten companies from the entire population of 100 largest companies in the UK, thus yielding ten clusters, and we would then interview 500 randomly selected employees at each of the ten companies.
Why is Probability Sampling so much better to do than non-probability sampling?
Because it allows us to make inferences from conclusions drawn from a sample to the population from which the sample was selected.
Essentially, it is a mechanism for reducing bias in the selection process.
Remember STAA36 ;)
What is Equivalence Sampling?
A method that aims to ensure that findings are equivalent BETWEEN samples. It is NOT a probability sampling method and does not in and of itself make the sample generalizable to the whole population
What does a confidence interval of 95 % in a sampling distribution imply?
That we can be 95 % confident that the true population mean lies between the sample mean and + or - 1.96 standard errors of the mean.