Week 3 (Chapter 8): September 23, 2021 Flashcards
Vocab:
Sampling
Unit of Analysis
Population
Sampling: Selecting a limited number of cases out of a larger population cases.
Unit of Analysis: What we want to study, individual, society, country….
Population: The population of cases that are relevant to the study - they could be people, but they could also be the population of countries (e.g., OECD countries)
Definition and goals of probability and non-probability samples
In a probability sample, the goal is to be representative, each unit has a known chance of being selected. For example if we take a random sample of 10 units out of a population of 100, each unit has a 10% chance of being selected. They are desirable because;
Sample means can be used to estimate population means
If the population is normally distributed, the sample will usually be normally distributed too
It it possible to calculate the discrepancy between sample and population mean through the sampling error
It is possible to test how well our results resemble what we could expect to see in the population by using inferential statistical test
Non-probability; any sample technique where units in a population do not have an equal probability of selection. Are not supposed to be representative of the population and are likely to be somewhat biased.
Simple Random Sample
Subjects are given numbers and they are randomly chosen. Every unit has an equal probability of selection. Is unbiased and representative. However, can be expensive and time-consuming.
Systematic Random Sample:
Selection from the population list has a predetermined number of non-chosen observations between chosen ones. For example, one-in-ten samples would have nine non-chosen observations for every one observation. It’s possible to thus replicate your results if they have the same sampling frame but if for every class there are 9 students and one instructor, you could end up getting all instructors. Easier to administer than simple random sample and a good approximation of what a random sample would be like, is also simple. However, the ordering could impact results, choosing certain values more than others.
Stratified/hierarchical random sample
best described as a series of two or more simple random samples operating within the same population. Basically separating the population into two separate groups then randomly sampling each group individually. Let’s say you have a smaller population, it may create the problem that your random selection method may go over smaller groups and if you want to have representation then you can do a stratified random sample. If you want equal representation of ages, you can select 25% of an age that represents 25% of the population but your selection of those cases are still random. Impossible to differentiate every single aspect which is way we prefer to use random selection.
Cluster Sample
Instead of individual units you get groups of clusters; for example instead of all residents in Ottawa, you make a street a cluster. From this cluster you random sample, can be advantageous if the sample you want to study is geographically concentrated. The disadvantage is greater risk or sampling error. Can lead to bias and non- representativeness.
Volunteer/Convenience Sample
Targets only individuals that possess characteristics that make them more accessible. Such as studying only those in Ottawa because it’s easier for me. Biased and unrepresentative.
Purposive Cases
Cases that are key to the study; you take cases that have the most potential for giving you relevant information for your study - of non-probability sampling in which researchers rely on their own judgment when choosing members of the population to participate in their surveys
Snowball Sample
Techniques used for populations that are not easily identified or are resistant to being studied. Researchers will make contact with a small group to identify others who might participate, thus making participants into informants.
Quota Sample
Quota sampling is a non-random sampling technique in which participants are chosen on the basis of predetermined characteristics so that the total sample will have the same distribution of characteristics as the wider population. The number of people in particular groups is used as a criteria selection. Similar to stratified but instead of random selection in the subgroups you use one of the 3 non-probability sample selections above.
Sampling Error Definition
There is always some degree of mismatch between the sample and the population, this is called the sampling error. Measure sampling error with standard error.
Presenting Data
Goal
Audience
Presentation must be clear and effective: we want to summarize
What is the goal of the illustration? Conveying a specific subset of descriptive information
Who is the target audience? Experts (sophisticated level), general audience (simplify complex information)
Categories in charts must be “natural” as much as possible
If too many response categories; merge categories (as long as you show the important feature of the distribution of the data)
Use graphs and charts instead of tables
Standard error definition:
The standard error of the mean, or simply standard error, indicates how different the population mean is likely to be from a sample mean. It tells you how much the sample mean would vary if you were to repeat a study using new samples from within a single population.
In statistics, data from samples is used to understand larger populations. Standard error matters because it helps you estimate how well your sample data represents the whole population.
A high standard error shows that sample means are widely spread around the population mean—your sample may not closely represent your population. A low standard error shows that sample means are closely distributed around the population mean—your sample is representative of your population.
You can decrease standard error by increasing sample size. Using a large, random sample is the best way to minimize sampling bias.
The best way to report the standard error is in a confidence interval because readers won’t have to do any additional math to come up with a meaningful interval.
A confidence interval is a range of values where an unknown population parameter is expected to lie most of the time, if you were to repeat your study with new random samples.
With a 95% confidence level, 95% of all sample means will be expected to lie within a confidence interval of ± 1.96 standard errors of the sample mean.
Based on random sampling, the true population parameter is also estimated to lie within this range with 95% confidence.
Formula for when population parameters are unknown:
SE is standard error
s is sample standard deviation
n is the number of elements in the sample
Validity, reliability, measurement error and interviewer effect in survey research
○ Validity: Are we realistically measuring what we want? Since survey questions are simple you may lose some complexity - if you have a complex concept you may want to ask more questions to capture it
○ Reliability: The measure has to be consistent - the choice of a scale such as (0-5) choose this, it depends on if the respondents understand the different meanings associated with each choice
○ Measurement error: Are you sure the question is understood well, when being more specific and hoping respondents are all at the same starting point to provide a general understanding to complex terms such as democracy
○ Interviewer effect and opinion creation: Many respondents become uneasy about answering the survey if they’re interviewed and don’t understand some complex questions but still want to please the interviewer