SIGNIFICANCE TESTS Flashcards
What are two things we can do with sample statistics?
- Inferences
- Testing hypothesis
What is hypothesis testing?
Hypothesis testing is a method used in research to figure out if the findings from a small group of people (a sample) can support an idea or theory about a larger group (the population). It’s a structured way to decide if the evidence you’ve collected is strong enough to back up your theory, or if the results might just be due to random chance. If the evidence is strong, you can say it supports the theory. If it’s not strong enough, you can’t confidently say the theory applies to the whole population.
Imagine you have a theory about a big group of people (the population), like “people who drink tea concentrate better.” Are the findings reliable and support your theory or are they just a coincidence.
What are the two most influential approaches to modern null hypothesis significance testing (NHST) and by who were they developed?
Fisher’s Null Hypothesis Testing and Neyman-Pearson Decision Theory. developed by Sir Ronald Fisher and Newman and Pearson.
Is Fisher and Newman-Pearson’s theory reliable?
Not so much.
What is a null hypothesis?
A null hypothesis is a starting assumption in research that says there is no effect, no difference, or no relationship between the things being studied. It’s like saying, “Nothing special is happening.”
“Listening to metal music has no effect on aggression levels.
What does to “nullify” mean?
It means to reject the null hypothesis.
Could the null hypothesis be : there is an effect between the amount of time studying and test scores?
No, the null hypothesis cannot be “There is a relationship between the amount of time spent studying and test scores.” The null hypothesis always assumes no effect, no difference, or no relationship because it serves as the baseline or default position that you test against.
In this case, the null hypothesis would always be:
“There is no relationship between the amount of time spent studying and test scores.”
The alternative hypothesis, on the other hand, would be:
“There is a relationship between the amount of time spent studying and test scores.”
What is a sampling space when referring to fisher’s theory first step?
Sampling space is a set of all possible results that could occur under the assumption that the null hypothesis is true.
EXAMPLE: “The drug has no effect on blood pressure.”
Sampling space: This would include all possible changes in blood pressure if the drug had no effect. Maybe it’s between -5 to +5 mmHg, because if there is no effect, you wouldn’t expect huge changes, just random fluctuations that might happen by chance.
What are the two steps of Fisher’s theory?
- Set up a theoretical null hypothesis (H0) in order to provide a sampling space for the research data. Null does not refer to a zero mean difference or zero correlation, but to any hypothesis to be nullified.
2.Do the right statistical test (like a t-test). The test checks how unlikely your results are of being random chance if H0 (the null hypothesis) is true. The more unlikely the results are of being random chance, the stronger the evidence against H0. Always give the exact p-value (e.g., p = 0.05 or p = 0.049). Don’t just stick to the 5% rule, and avoid saying you “accept” or “reject” H0—just explain what the data shows.
Si la valeur p est inférieure a 0,05, cela veut dire que que la difference observée entre les résultats, a moins de 5% de chance d’être due à la chance. Bien évidemment, en assumant qu’il n’y a aucune différence entre les résultats, et si yen a, eh bien c du à la chance.
What is Fisher’s test based on?
- The actual test is based on a rational assessment: Is the research data so improbable under the null hypothesis that we may doubt the null hypothesis explains the results? :
Example of a Rational Assessment:
Let’s say you’re studying the effect of study time on test scores. After running the test, you get a p-value of 0.04.
Rational assessment:
“The p-value of 0.04 indicates that there’s a 4% chance of seeing such a difference in test scores if there were no relationship between study time and test scores. This is relatively unlikely, suggesting there may be a connection between the two variables. However, since this is just a small piece of evidence, it’s important to consider the practical impact and whether other factors could be influencing the results.”
The Key Idea:
A rational assessment is about interpreting the evidence thoughtfully and explaining what the data tells you, rather than simply making a yes/no decision based on the p-value. You want to show why
What is Neyman-Pearson Decision Theory?
- Set up two statistical hypotheses: A null (H0) and an alternative (H1) along with their sampling space.
H1 = hypothesis under the assumption that there is some kind of effect.
- Decide about alpha, beta, and sample size before the experiment, based on subjective cost-benefit considerations. These define the rejection region for each hypothesis.
- If the data falls into the rejection region H0, accept H1; otherwise accept H0. Note that accepting a hypothesis does not mean you believe in it, but only that you act as it were true.
What is Alpha?
Alpha (other word: false positive)
* Probability the test will produce a Type I error: We mistakenly conclude there is a genuine effect (e.g. treatment works - maybe participants changed their lifestyles… and that was not assessed….) in our population, when in fact there isn’t.
* The probability is the α-level (alpha level) (usually .05): it is a threshold you set before conducting the test
– We believe we are incorrectly rejecting the null hypothesis only 5% of the time (basically, if you do screw up, you only screw up 5% of the time)
a = 0.05, means you are willing to accept a 5% chance of making a type 1 error (so being wrong 5% of the time - seeing an effect when here isn’t one). You are confident that 95% of the time, your results will be correct (seeing an effect and there is one).
- keep in mind, you never know you made a type 1 error, until someone tries to replicate the study.
What is Beta?
Contraire Alpha.
* Probability the test will produce a Type II error : we mistakenly conclude there is no genuine effect in our population, when in fact there is.
* The probability is the b-level (beta level) (usually .2)
Example: pill vs sugar pill. Gp exp: blood pressure go down. Sugar pill/control group: change lifestyle so blood pressure also goes down!
What is power?
This is talking about the power of a test.
The power of a test is basically the chance of correctly finding an effect if there really is one (i.e., detecting a real effect).
The ability of a test to detect an effect of a particular size: the probability of rejecting the null hypothesis when it is false. (When the null hypothesis is actually false, what’s the chance of detecting that and rejecting it? This is when you’re finding a real effect!)
In other words, it is the ability of a test to not make a type 1 or type 2 error (not to fuck it up)
Calcul:
Usually 1- Beta (0.8 is a good level to aim for)
In reference to Neyman-Pearson Decision Theory. What is the critical value?
The critical value/ cutoff point is a point (or threshold) that helps you decide whether the test statistic (your observed data) is in the rejection region (the area where you would reject the null hypothesis, and accept the alternative hypothesis) or in the acceptance region/retention region (where you would not reject H0).
Why did Fisher and Neyman viewed their models as incompatible? (en maj)
Fisher focused on p-values and statistical significance as a way to assess EVIDENCE against the null hypothesis. He saw hypothesis testing as a way to assess the strength of evidence in a particular study.
Neyman-Pearson, on the other hand, emphasized the CONTROL OF ERROR RATERS (Type I and Type II errors) and wanted to define decision rules based on pre-set alpha levels and power. He was more focused on decision-making in repeated trials.
What is the Hybrid model of the hypothesis testing process?
A combination of Fisher’s and Neyman-Pearson’s models.
If you want to know something about a population, what should you do?
Get a sample of that population.
When hypothesis testing, are we only interested in the effects of the study on people of the sample?
No, we are interested in the effects of the study on poplin general.
What is STEP 1 of the hybrid model?
Formulating the null and alternative hypotheses (which are MUTUALLY EXCLUSIVE: only one of them can be true at any given time - you either reject or accept)
The Greek symbol μ is used to represent the population (mean?)
μ0 = comparison population
μ1 = population represented by the sample
2 hypotheses:
- Research hypothesis: is a statement about the predicted differences between populations
H1: μ1 ≠ μ0
-Null hypothesis: is a statement predicting no differences between populations
H0: μ1 = μ0
the population means are equal is equivalent to saying that the difference in means is 0:
μ1 - μ0 = 0
EXAM QST: What is the core logic of hypothesis testing?
We are testing the notion that the no difference exist between the populations under study. In other words, we are testing the null, by assuming that the pop. means are the same unless we can prove otherwise. Same idea as innocent til proven guilty.
What is STEP 2 of the Hybrid model?
DETERMINE THE CHARACTERISTICS OF THE COMPARISON DISTRIBUTION (find the mean and standard deviation of the distribution of mean (the comparison distribution), and you are good to go).
EXPLICATION:
In this step we are asking: What is the probability of obtaining a particular sample value if the null hypothesis is true?
In order to determine that probability, we need to know the characteristics of the distribution the sample value would come from if the null hypothesis were true.
This distribution is called the comparison distribution (sampling distribution) -sampling distribution of the means (for one sample mean when stand dev is known). We compare that single sample to the larger sampling distribution
BREF, WE GOTTA FIND THE P. OF GETTING THAT SAMPLE MEAN ON THE DISTRIBUTION OF MEANS. THATS IT. Because remember the core logic of hypothesis testing: we assume that the null hypothesis is true!
In STEP 2, what is the sampling distribution (comparison distribution) a representation of? And why is it referred to as “comparison distribution”?
A representation of what the data would look like if the null hypothesis were true.
Why “comparison distribution”? You use this distribution to compare your observed sample mean to what you’d expect under the null hypothesis. It tells us about the population (Remember u = um…)
H0 assumes no effect or difference:
For example, if you’re testing whether a new teaching method improves test scores, the null hypothesis (H0) might state: “The new teaching method has no effect on students’ test scores.”
Under this assumption, you expect that if you were to repeatedly sample from the population, you would not see any significant differences in scores due to the method.
EXAMPLE: Example: Testing if a new drug has an effect
Let’s say you want to test whether a new drug improves people’s blood pressure.
Null hypothesis (H0): “The drug has no effect on blood pressure.”
This means that, according to H0, the population mean for blood pressure before and after taking the drug is the same. The null hypothesis assumes no difference.
Now, let’s say you take a sample of 30 people, measure their blood pressure, and calculate the mean.
Under the assumption that H0 is true, you would expect the mean blood pressure of your sample to be close to the population mean (i.e., no difference).
You would then create a sampling distribution of sample means, which represents what the means of many different samples would look like if H0 were true.
If your sample mean is far from what the null hypothesis predicts, this could indicate that H0 is unlikely (i.e., that the drug does have an effect).
What is the first of STEP 3 of the Hybrid model ?
SELECT THE SIGNIFICANCE VALUE.
- The significance level is a number that expresses the probability that the results of the given study could have occurred purely by chance
- The significance level is represented by the Greek letter alpha (α), and is usually set at .05 or 5%
- When probability of obtaining the sample results are less than the significance level the null hypothesis is rejected, and the results is said to be statistically significant: In other words; if the probability of getting a sample mean is less than 5%, you can reject the nullify it is greater than 5%, you keep the null (more chance difference is due to chance lol)