Ch5: Sampling and Probability Flashcards
The goal of sampling is simple:
Samples and their populations
collect a sample that represents the population
2 main types of sample:
Samples and their populations
- Random
- Convenience
Pros of random sampling:
Samples and their populations
- Ideal
- More representative
Cons of random sampling:
Samples and their populations
- Expensive
- Results in lots of practical problems
- Almost impossible to obtain
Convenience sampling - easier to grab samples from a more local portion of the population, but this comes with a significant downside:
Samples and their populations
Affects generalizability: refers to researcher’s ability to apply findings from one sample or in one context to other samples or contexts; AKA external validity
While external validity is affected, how can we improve the internal validity of convenience sampling?
Samples and their populations
Replication: the duplication of scientific results, ideally in a different context or with a sample that has different characteristics (sometimes called reproducibility); AKA just doing the study again and again
When must we be even more cautious than using convenience samples?
Using self-selected/volunteer samples
Why are volunteer samples becoming more popular?
- Crowdsourcing: occurs when a research team solicits input from a very large group of people, usually recruited online
- On one hand, this is enabled by the internet
- On the other, we must be cautious - collecting data through crowdsourcing
Mturk - what is it and its downsides?
- Mturk (Amazon Mechanical Turk) - an online website where anyone can recruit people to complete tasks for a small fee
- While convenient, the limitations of using such volunteer sampling remain (EX - biases)
- Also subject to fraud - upwards trend of random responses (including randomly selected responses and numbers)
- Ethical concerns arise with respect to participants, often called Turkers: some note that their hourly pay amounts to less than study participants who are paid through other means, and is less than the minimum wage in countries such as the US and Canada
- Some say that the online nature of Mturk may not protect their privacy/anonymity
- For those who use Turkers in their work, there’s now guidelines for being an ethical researcher
Random assignment = a distinctive signature of a scientific study; WHY?
It evens the levels of the playing field when every participant has an equal chance of being assigned to any level of the IV
Why is probability central to inferential stats?
Because our conclusions about a population are based on data collected from a sample rather than on anecdotes and testimonials
Two personal biases get intertwined in our day to day thinking:
- Confirmation bias: usually unintentional tendency to pay attention to evidence that confirms what we already believe and to ignore evidence that would disconfirm our beliefs
- Illusory correlations: the phenomenon of believing one sees an association between variables when no such association exists
When we discuss probability in everyday conversation, we tend to think of what statisticians call…
- Personal probability: a person’s own judgement about the likelihood that an event will occur, also called subjective probability; really just our best guess
- EX: “there’s a 75% chance I’ll finish my paper tonight!”
Statisticians are concerned with a different type of probability
Probability: the likelihood that a particular outcome - out of all possible outcomes - will occur
In stats, we’re interested in a more specific definition of probability…
Expected relative-frequency probability: the likelihood of an event occurring, based on the actual outcome of many, many tries
In reference to probability - trial, outcome, success
- Trial: refers to each occasion that a given procedure is carried out
- Outcome: refers to the result of a trial
- Success: the outcome for which we’re trying to determine the probability
- Thus, probability = successes / trials
Probability, proportion, percentage
- Probability - the proportion that we expect to see in the long run
- Proportion - the number of successes divided by the number of trials
- Percentage - probability or proportion multiplied by 100
One of the central characteristics of expected relative-frequency probability
IT ONLY WORKS IN THE LONG RUN (REFERRED TO AS THE “LAW OF LARGE NUMBERS”)
To avoid bias, statistical probability requires that…
Independence and probability
…the individual trials be independent - as in that the outcome of each trial must not depend/rely in any way on the outcome of previous trials
3 steps to developing hypothesis:
- Establish control/experimental group
- Development of the hypotheses to be tested
- Making a decision about the hypothesis
When we calculate inferential stats, we’re actually comparing two hypotheses:
- Null hypothesis
- Research hypothesis
Null hypothesis
- A statement that postulates that there is no difference between populations or that the difference is in a direction opposite to that anticipated by the researcher (like > or = to, or < or = to)
- Can think of as the boring hypothesis, because it proposes that nothing will happen
Research Hypothesis
- A statement that postulates a difference between populations
- Can specify a direction
- NOTE: important to state the comparison group (i.e. - the group viewing a photo with healthy crackers has a higher avg. calorie estimate, INSTEAD: the group viewing a photo with healthy crackers has a higher avg. calorie estimate THAN the group that views the photo without the healthy crackers)
How to we compare the null vs. research hypothesis to determine probability?
- We formulate the null hypothesis and the research hypothesis to set them up against each other
- We use stats to determine the probability that there is a large enough difference between the means of the samples that we can conclude there’s likely a difference between the means of the underlying populations
- So, probability plays into the decision we make about the hypotheses