SIGNIFICANCE TESTS Flashcards by Imane Elmouden

What are two things we can do with sample statistics?

Inferences
Testing hypothesis

How well did you know this?

Not at all

Perfectly

What is hypothesis testing?

Hypothesis testing is a method used in research to figure out if the findings from a small group of people (a sample) can support an idea or theory about a larger group (the population). It’s a structured way to decide if the evidence you’ve collected is strong enough to back up your theory, or if the results might just be due to random chance. If the evidence is strong, you can say it supports the theory. If it’s not strong enough, you can’t confidently say the theory applies to the whole population.

Imagine you have a theory about a big group of people (the population), like “people who drink tea concentrate better.” Are the findings reliable and support your theory or are they just a coincidence.

How well did you know this?

Not at all

Perfectly

What are the two most influential approaches to modern null hypothesis significance testing (NHST) and by who were they developed?

Fisher’s Null Hypothesis Testing and Neyman-Pearson Decision Theory. developed by Sir Ronald Fisher and Newman and Pearson.

How well did you know this?

Not at all

Perfectly

Is Fisher and Newman-Pearson’s theory reliable?

Not so much.

How well did you know this?

Not at all

Perfectly

What is a null hypothesis?

A null hypothesis is a starting assumption in research that says there is no effect, no difference, or no relationship between the things being studied. It’s like saying, “Nothing special is happening.”

“Listening to metal music has no effect on aggression levels.

How well did you know this?

Not at all

Perfectly

What does to “nullify” mean?

It means to reject the null hypothesis.

How well did you know this?

Not at all

Perfectly

Could the null hypothesis be : there is an effect between the amount of time studying and test scores?

No, the null hypothesis cannot be “There is a relationship between the amount of time spent studying and test scores.” The null hypothesis always assumes no effect, no difference, or no relationship because it serves as the baseline or default position that you test against.

In this case, the null hypothesis would always be:
“There is no relationship between the amount of time spent studying and test scores.”

The alternative hypothesis, on the other hand, would be:
“There is a relationship between the amount of time spent studying and test scores.”

How well did you know this?

Not at all

Perfectly

What is a sampling space when referring to fisher’s theory first step?

Sampling space is a set of all possible results that could occur under the assumption that the null hypothesis is true.

EXAMPLE: “The drug has no effect on blood pressure.”
Sampling space: This would include all possible changes in blood pressure if the drug had no effect. Maybe it’s between -5 to +5 mmHg, because if there is no effect, you wouldn’t expect huge changes, just random fluctuations that might happen by chance.

How well did you know this?

Not at all

Perfectly

What are the two steps of Fisher’s theory?

Set up a theoretical null hypothesis (H0) in order to provide a sampling space for the research data. Null does not refer to a zero mean difference or zero correlation, but to any hypothesis to be nullified.

2.Do the right statistical test (like a t-test). The test checks how unlikely your results are of being random chance if H0 (the null hypothesis) is true. The more unlikely the results are of being random chance, the stronger the evidence against H0. Always give the exact p-value (e.g., p = 0.05 or p = 0.049). Don’t just stick to the 5% rule, and avoid saying you “accept” or “reject” H0—just explain what the data shows.

Si la valeur p est inférieure a 0,05, cela veut dire que que la difference observée entre les résultats, a moins de 5% de chance d’être due à la chance. Bien évidemment, en assumant qu’il n’y a aucune différence entre les résultats, et si yen a, eh bien c du à la chance.

How well did you know this?

Not at all

Perfectly

What is Fisher’s test based on?

The actual test is based on a rational assessment: Is the research data so improbable under the null hypothesis that we may doubt the null hypothesis explains the results? :

Example of a Rational Assessment:
Let’s say you’re studying the effect of study time on test scores. After running the test, you get a p-value of 0.04.

Rational assessment:
“The p-value of 0.04 indicates that there’s a 4% chance of seeing such a difference in test scores if there were no relationship between study time and test scores. This is relatively unlikely, suggesting there may be a connection between the two variables. However, since this is just a small piece of evidence, it’s important to consider the practical impact and whether other factors could be influencing the results.”
The Key Idea:
A rational assessment is about interpreting the evidence thoughtfully and explaining what the data tells you, rather than simply making a yes/no decision based on the p-value. You want to show why

How well did you know this?

Not at all

Perfectly

What is Neyman-Pearson Decision Theory?

Set up two statistical hypotheses: A null (H0) and an alternative (H1) along with their sampling space.

H1 = hypothesis under the assumption that there is some kind of effect.

Decide about alpha, beta, and sample size before the experiment, based on subjective cost-benefit considerations. These define the rejection region for each hypothesis.
If the data falls into the rejection region H0, accept H1; otherwise accept H0. Note that accepting a hypothesis does not mean you believe in it, but only that you act as it were true.

How well did you know this?

Not at all

Perfectly

What is Alpha?

Alpha (other word: false positive)
* Probability the test will produce a Type I error: We mistakenly conclude there is a genuine effect (e.g. treatment works - maybe participants changed their lifestyles… and that was not assessed….) in our population, when in fact there isn’t.
* The probability is the α-level (alpha level) (usually .05): it is a threshold you set before conducting the test
– We believe we are incorrectly rejecting the null hypothesis only 5% of the time (basically, if you do screw up, you only screw up 5% of the time)

a = 0.05, means you are willing to accept a 5% chance of making a type 1 error (so being wrong 5% of the time - seeing an effect when here isn’t one). You are confident that 95% of the time, your results will be correct (seeing an effect and there is one).

keep in mind, you never know you made a type 1 error, until someone tries to replicate the study.

How well did you know this?

Not at all

Perfectly

What is Beta?

Contraire Alpha.
* Probability the test will produce a Type II error : we mistakenly conclude there is no genuine effect in our population, when in fact there is.
* The probability is the b-level (beta level) (usually .2)

Example: pill vs sugar pill. Gp exp: blood pressure go down. Sugar pill/control group: change lifestyle so blood pressure also goes down!

How well did you know this?

Not at all

Perfectly

What is power?

This is talking about the power of a test.
The power of a test is basically the chance of correctly finding an effect if there really is one (i.e., detecting a real effect).

The ability of a test to detect an effect of a particular size: the probability of rejecting the null hypothesis when it is false. (When the null hypothesis is actually false, what’s the chance of detecting that and rejecting it? This is when you’re finding a real effect!)

In other words, it is the ability of a test to not make a type 1 or type 2 error (not to fuck it up)

Calcul:
Usually 1- Beta (0.8 is a good level to aim for)

How well did you know this?

Not at all

Perfectly

In reference to Neyman-Pearson Decision Theory. What is the critical value?

The critical value/ cutoff point is a point (or threshold) that helps you decide whether the test statistic (your observed data) is in the rejection region (the area where you would reject the null hypothesis, and accept the alternative hypothesis) or in the acceptance region/retention region (where you would not reject H0).

How well did you know this?

Not at all

Perfectly

Why did Fisher and Neyman viewed their models as incompatible? (en maj)

Study These Flashcards

Fisher focused on p-values and statistical significance as a way to assess EVIDENCE against the null hypothesis. He saw hypothesis testing as a way to assess the strength of evidence in a particular study.

Neyman-Pearson, on the other hand, emphasized the CONTROL OF ERROR RATERS (Type I and Type II errors) and wanted to define decision rules based on pre-set alpha levels and power. He was more focused on decision-making in repeated trials.

What is the Hybrid model of the hypothesis testing process?

Study These Flashcards

A combination of Fisher’s and Neyman-Pearson’s models.

If you want to know something about a population, what should you do?

Study These Flashcards

Get a sample of that population.

When hypothesis testing, are we only interested in the effects of the study on people of the sample?

Study These Flashcards

No, we are interested in the effects of the study on poplin general.

What is STEP 1 of the hybrid model?

Study These Flashcards

Formulating the null and alternative hypotheses (which are MUTUALLY EXCLUSIVE: only one of them can be true at any given time - you either reject or accept)

The Greek symbol μ is used to represent the population (mean?)
μ0 = comparison population
μ1 = population represented by the sample

2 hypotheses:

Research hypothesis: is a statement about the predicted differences between populations
H1: μ1 ≠ μ0

-Null hypothesis: is a statement predicting no differences between populations
H0: μ1 = μ0

the population means are equal is equivalent to saying that the difference in means is 0:
μ1 - μ0 = 0

EXAM QST: What is the core logic of hypothesis testing?

Study These Flashcards

We are testing the notion that the no difference exist between the populations under study. In other words, we are testing the null, by assuming that the pop. means are the same unless we can prove otherwise. Same idea as innocent til proven guilty.

What is STEP 2 of the Hybrid model?

Study These Flashcards

DETERMINE THE CHARACTERISTICS OF THE COMPARISON DISTRIBUTION (find the mean and standard deviation of the distribution of mean (the comparison distribution), and you are good to go).

EXPLICATION:
In this step we are asking: What is the probability of obtaining a particular sample value if the null hypothesis is true?

In order to determine that probability, we need to know the characteristics of the distribution the sample value would come from if the null hypothesis were true.

This distribution is called the comparison distribution (sampling distribution) -sampling distribution of the means (for one sample mean when stand dev is known). We compare that single sample to the larger sampling distribution

BREF, WE GOTTA FIND THE P. OF GETTING THAT SAMPLE MEAN ON THE DISTRIBUTION OF MEANS. THATS IT. Because remember the core logic of hypothesis testing: we assume that the null hypothesis is true!

In STEP 2, what is the sampling distribution (comparison distribution) a representation of? And why is it referred to as “comparison distribution”?

Study These Flashcards

A representation of what the data would look like if the null hypothesis were true.

Why “comparison distribution”? You use this distribution to compare your observed sample mean to what you’d expect under the null hypothesis. It tells us about the population (Remember u = um…)

H0 assumes no effect or difference:
For example, if you’re testing whether a new teaching method improves test scores, the null hypothesis (H0) might state: “The new teaching method has no effect on students’ test scores.”
Under this assumption, you expect that if you were to repeatedly sample from the population, you would not see any significant differences in scores due to the method.

EXAMPLE: Example: Testing if a new drug has an effect
Let’s say you want to test whether a new drug improves people’s blood pressure.

Null hypothesis (H0): “The drug has no effect on blood pressure.”
This means that, according to H0, the population mean for blood pressure before and after taking the drug is the same. The null hypothesis assumes no difference.
Now, let’s say you take a sample of 30 people, measure their blood pressure, and calculate the mean.

Under the assumption that H0 is true, you would expect the mean blood pressure of your sample to be close to the population mean (i.e., no difference).
You would then create a sampling distribution of sample means, which represents what the means of many different samples would look like if H0 were true.
If your sample mean is far from what the null hypothesis predicts, this could indicate that H0 is unlikely (i.e., that the drug does have an effect).

What is the first of STEP 3 of the Hybrid model ?

Study These Flashcards

SELECT THE SIGNIFICANCE VALUE.

The significance level is a number that expresses the probability that the results of the given study could have occurred purely by chance
The significance level is represented by the Greek letter alpha (α), and is usually set at .05 or 5%
When probability of obtaining the sample results are less than the significance level the null hypothesis is rejected, and the results is said to be statistically significant: In other words; if the probability of getting a sample mean is less than 5%, you can reject the nullify it is greater than 5%, you keep the null (more chance difference is due to chance lol)

Regarding step 3, what is the letter alpha associated with?

Z score.

How do you choose the CRITICAL Z value (threshold or rejecting or retaining)? 2 things.

Based on alpha and on the direction of the research hypothesis.

What is the second step of STEP 3 of the hybrid model?

Decide whether you are working with one-tailed hypothesis tests (directional research hypotheses) or two tailed hypothesis tests (non-directional research hypothesis)

When do you use one-tailed hypothesis tests? How is the test carried out?

When there is a specific predicted direction of effect (such as predicting an increases or decrease) H1:μ1 >μ0 (H0 :μ1 ≤μ0) H1:μ1 <μ0 (H0 :μ1 ≥μ0 ) a) to reject the null hypothesis the obtained score has to be in a region of the comparison distribution that is in the 5% range b) has to be in the area of 1 tail only (one-tailed test) c) overall alpha (α) is 5% which is associated with a critical z value (Zcrit) of 1.65 So if we assume: H1: μ1 < μ0

When do you use two-tailed hypothesis tests? How is the test carried out?

Non-directional research hypothesis There is no specific predicted direction of effect (predict one population differs from another) H1:μ1 ≠μ0 (H0 :μ1 =μ0) Hypothesis testing is carried out in the following manner a) to reject the null hypothesis the obtained score has to be in a region of the comparison distribution that is in the upper 2.5 % or the lower 2.5% b) can be in either tail (two-tailed test) c) requires more extreme scores (to be rejected, cause more far right and left) d) overall alpha (α) is 5% which is associated with a critical z value (Zcrit) of 1.96

OK so recap. When do you use one tailed and two tailed tests? What are you sacrificing with each?

1 tail: clear idea of directional effects (increase/decrease) 2 tails: no clue One tailed test: - easier to reject null hypothesis (cutoff score is smaller, and the sample results need not be so extreme) - Problem: if the result is in the other direction you cannot reject the null hypothesis no matter how extreme the sample results - Not so simple in the real world; while expecting a certain result the opposite may be more interesting (using 1 tail you run the risk of ignoring important results) Avec two tailed, contraire!

If a problem question, states "different"and not "higher" or "lower", which test do you turn to?

2 tailed test.

Why do we use a significance level of 5%?

Because Fisher said so: “If the probability of such an event were sufficiently small – say, 1 in chance in 20 – then one might regard the results as significant

Still in STEP 3, what is the difference between probability (p) and alpha (a)?

-p: the exact probability that the null hypothesis. It comes directly from your data. - α is the threshold below which is considered so small that we decide to reject the null hypothesis, AND IS DETERMINED BY YOU, THE JUNIOR STATISTICIAN, IN ADVANCE (usually 5%, unless Walter specifies otherwise) - if p value is less than α we reject the null hypothesis (p < α) -if p value is greater than α we retain the null hypothesis (p > α)

What is STEP 4 of the hybrid model?

SELECT THE TEST STATISTIC AND CALCULATE ITS VALUE (We) use the Z-test when we have 1 sample, and both population parameters are known (both mean and stand. dev). This is what we use to test the null. hypothesis.

What is STEP 5 of the hybrid model?

DETERMINE THE CRITICAL VALUE(S) ON THE COMPARISON DISTRIBUTION AT WHICH THE NULL HYPOTHESIS SHOULD BE REJECTED. - The critical value(s) will bound rejection and non-rejection regions for the null hypothesis, H0. - They are determined from the significance level selected in step 3. - In a one-tail test, there will be one critical value since H0 can be rejected by an extreme result in just one direction. - Two-tail tests will require two critical values since H0 can be rejected by an extreme result in either direction. - Critical values are usually stated in terms of: Z value (Zcri) so if : Zobt supérieur à Zcri : reject H0 Zobt inférieur à Zcri : retain H0

What is STEP 6 of hybrid model?

COMPARE CALCULATED AND CRITICAL VALUES AND REACH A CONCLUSION ABOUT THE NULL HYPOTHESIS. compare obtained test statistic to the critical test value -if the test statistic is more extreme than critical test value a) statistical significance, p < .05 b) reject the null hypothesis c) research hypothesis is supported d) shut down the lab for the day and PARTY!!!!!! -if the test statistic is less extreme than critical test value a) statistical significance is not reached, p > .05 b) accept the null hypothesis c) research hypothesis is not supported d) seriously consider that job offer from Wal-Mart

What does practical significance refer to?

Using confidence intervals as a way to do hypothesis testing. If the confidence interval does not include the the mean of the null hypothesis distribution then the result is significant. So, meaning you reject the null!! Lower limit = X −[zcritical ×σ M ] Upper limit = X +[zcritical ×σ M ]

What is the difference between critical z value and the obtained z value?

Critical z-value: This is a cutoff point based on the significance level (α, like 0.05). It tells you the boundary for the rejection region. For example, for a two-tailed test with α = 0.05, the critical z-values are ±1.96. If your obtained z-value falls beyond these (in the tails), you reject the null hypothesis. Obtained z-value: This is the z-score you calculate from your sample data. It tells you how far your sample mean is from the population mean (under the null hypothesis), measured in standard deviations. You compare the two: if the obtained z-value is more extreme than the critical z-value, you reject the null hypothesis. If not, you fail to reject it.

SIGNIFICANCE TESTS Flashcards

(38 cards)