L6 Ch5 Null Hypothesis Significance Testing Flashcards
Disclaimer
There are quite a few of repetitions in these flashcards, and the ones from the book are not integrated but are in a separate second part
I am very sorry about this but I don’t have time to make them look nice, so I hope that at least they are clear enough to study
sampling distribution
- distribution of means (usually)
- how it would look if we were to repeat distributions over and over again
- relates to null-hypothesis and alternative hypothesis (used to understand and interpret papers, in our case)
Fischer
- inventer of the p-value & null-hypothesis
- experiment with lady tasting milk or tea first
Neyman-Pearson
- inventors of alternative hypothesis
- null-hypothesis and alternative hypothesis combined in one paradigm with p-value
> tricky to specify what an alternative hypothesis is
Standard Error
= variability in sampling distribution (variability that you can expect when repeating the experiment)
- SE high if lots of variability in variable
- SE low if high sample size
> high sample size → low variability → low SE
Frequentist probability
- considers p-value and sampling distribution
- computes objective probability of an event
- relative frequency (outcomes of event) in the long run (over same test done multiple times)
how can confidence intervals be interpreted?
- compute CI for 100 samples, and create sampling distribution for said samples
- a CI of 95% means that 95 out of the 100 CIs for the samples will contain the population mean
~ single CI either contains the true value or it doesn’t
~ wider or narrower based on how certain we are of the inference
~ much better than using point estimate
(see picture 2)
how can confidence intervals be calculated?
- lowerbound: mean - 1.96 x SE
- upperbound: mean + 1.96 x SE
(picture 1)
What is the SE used for? How?
- parameter estimation (for population)
> through confindence intervals (usually 95%)
~ higher SE → higher variability → broader CI (to reach 95% confidence)
~ lower SE → lower variability → narrower CI (to reach 95% confidence)
how can SE be calculated?
standard deviation / square root of sample size
sampling distributions under Ha
- different than under H0
- e.g. skewed
“R”
- what is it?
- R vs Excel
- in exam
- can be used as simple calculator
- much more extensive than excel
- open source (important as science should be open)
> primarly used as calculator (no extensive programming)
> data simulation
Binomial sampling distribution under H0
- how to compute it in R
“if probability of heads is 0.5, what is the probability of getting 8/10 heads?”
- remember to run all the lines!
1. n <- 10 (sample size)
2. k <- 0:n (discrete probability space)
> this means that k is equal to the number 0 to n (10)
3. p <- .5 (probability of head)
4. coin <- 0:1
5. permutations <- factorial(n) / ( factorial(k) * factorial(n-k) )
- “barplot” function → give values of probabilities to function, and it constructs the plot
(picture 3)
! look at WAs for representations of how R will be in the exam
Type I error
- reject null hypothesis when it is true
- “false positive”
what are the possible outcomes if we make a decision in frequentist framework?
(see picture 4)
- rows: do we (not) reject the H0?
- columns: is the H0 actually true/false?
- two squares per correct or incorrect decision (type I or type II error)
Type II error
- not reject null hypothesis when it is false
how strict do we want to be when evaluating the H0?
- decide on alpha level (usually 0.05
- if p-value is below alpha level, we reject null hypothesis
= type I error in 5% of the cases
!! alpha is type I error rate
effect sizes
- size of effect that we are looking for (e.g. size of correlation)
- plays a role in how much power our statistical procedure has
- standardized (divided by st.dev.)
how do we use the sampling distribution in regards to alpha?
- we mark areas in sampling distribution that constitute an extreme enough observation that makes us reject H0
- with extreme observations we reject H0 (picture 5)
“power” of the analysis
- reject H0 when it is in fact false (correct decision)
- power is conditional probability of rejecting H0 when false
- it’s a function of sample size
- (alpha is the conditional probability of H0 being true)
how are effect sizes and power related?
the bigger the effect size, the more likely we are to have a higher power
what is the probability of not rejecting the H0 when it is true?
- 1-alpha
- “true negative”
(see picture 6)
what is the sum of the power and the true negative?
- 1
- they are conditional probabilities
Beta
- opposite of power
- incorrectly decide to not reject the H0 when it is false
- Type II error
- “false negative”
how does changing the value of alpha affect the evaluation of the H0?
- lower alpha
→ harder to reject H0
→ less type I error - higher alpha
→ easy to reject H0
→ less type II error
! alpha used to establish critical region
how do we calculate “power” in a sampling distribution?
- power: rejecting H0 when not true
- look at sampling distribution under Ha
→ there are many versions, but in the example of the dice, we could set the mean at 0.8
(see picture 7) - with new distribution, what is probability of rejecting H0 now? (what is our power?)
summary of this procedure
- we decide when to reject the H0, based on the sampling distribution under the H0
- look at sampling distribution + alpha level = reject H0?
- then change to Ha distribution while keeping the “red regions” the same
- now: what is the probability of rejecting H0 when Ha is true? (power)
→ power is the sum of probabilities in red
!!! to compute power, we look at sampling distributions under the Ha
what is the interplay between alpha and power?
- if low alpha level, we are stricter when rejecting H0
→ power decreases as well - balance between power and type I error rate
effect size vs Ha
- effect size: p of heads
- increase effect size (further from 0.5)
- more extreme values become more likely → more likely to reject null hypothesis
! conditioning on the effect size
what determines power level?
- alpha (lower alpha → lower power)
- effect size (higher e.s. → higher power)
> greater effect → more likely to reject H0 → more power
From the book
how can you distinguish a frequency plot from a histogram?
- frequency plot has small gaps between the columns
what is determined by the length of the whiskers?
- if whiskers have same length, then distribution is symmetrical
- if the top of bottom whisker is much longer than the opposite, then the distribution is asymmetrical
how can you compare the relative frequencies of scores across groups?
Under frequency plots:
- stack: shows the bars of each group stacked on top of each other
- identity: displays each overlapping bars, with a certain level of transparecy
- dodge: places the bars side by side within each bin
(see picture 8 & 9)
Boxplot
- center: median
- sides: interquartile range
- violin element: includes density distribution of the data
~ using split variable, you can visualize the group difference
(see picture 10)
how can we summarize the relationship between two variables?
- through a regression line
- “correlation plots”
Raincloud plots
- display individual data points,boxplots and distribution of data points
(see picture 11, 12 & 13)
Gigamega mastermind mind-map of plots
picture 14