L6 Ch5 Null Hypothesis Significance Testing Flashcards
Disclaimer
There are quite a few of repetitions in these flashcards, and the ones from the book are not integrated but are in a separate second part
I am very sorry about this but I don’t have time to make them look nice, so I hope that at least they are clear enough to study
sampling distribution
- distribution of means (usually)
- how it would look if we were to repeat distributions over and over again
- relates to null-hypothesis and alternative hypothesis (used to understand and interpret papers, in our case)
Fischer
- inventer of the p-value & null-hypothesis
- experiment with lady tasting milk or tea first
Neyman-Pearson
- inventors of alternative hypothesis
- null-hypothesis and alternative hypothesis combined in one paradigm with p-value
> tricky to specify what an alternative hypothesis is
Standard Error
= variability in sampling distribution (variability that you can expect when repeating the experiment)
- SE high if lots of variability in variable
- SE low if high sample size
> high sample size → low variability → low SE
Frequentist probability
- considers p-value and sampling distribution
- computes objective probability of an event
- relative frequency (outcomes of event) in the long run (over same test done multiple times)
how can confidence intervals be interpreted?
- compute CI for 100 samples, and create sampling distribution for said samples
- a CI of 95% means that 95 out of the 100 CIs for the samples will contain the population mean
~ single CI either contains the true value or it doesn’t
~ wider or narrower based on how certain we are of the inference
~ much better than using point estimate
(see picture 2)
how can confidence intervals be calculated?
- lowerbound: mean - 1.96 x SE
- upperbound: mean + 1.96 x SE
(picture 1)
What is the SE used for? How?
- parameter estimation (for population)
> through confindence intervals (usually 95%)
~ higher SE → higher variability → broader CI (to reach 95% confidence)
~ lower SE → lower variability → narrower CI (to reach 95% confidence)
how can SE be calculated?
standard deviation / square root of sample size
sampling distributions under Ha
- different than under H0
- e.g. skewed
“R”
- what is it?
- R vs Excel
- in exam
- can be used as simple calculator
- much more extensive than excel
- open source (important as science should be open)
> primarly used as calculator (no extensive programming)
> data simulation
Binomial sampling distribution under H0
- how to compute it in R
“if probability of heads is 0.5, what is the probability of getting 8/10 heads?”
- remember to run all the lines!
1. n <- 10 (sample size)
2. k <- 0:n (discrete probability space)
> this means that k is equal to the number 0 to n (10)
3. p <- .5 (probability of head)
4. coin <- 0:1
5. permutations <- factorial(n) / ( factorial(k) * factorial(n-k) )
- “barplot” function → give values of probabilities to function, and it constructs the plot
(picture 3)
! look at WAs for representations of how R will be in the exam
Type I error
- reject null hypothesis when it is true
- “false positive”
what are the possible outcomes if we make a decision in frequentist framework?
(see picture 4)
- rows: do we (not) reject the H0?
- columns: is the H0 actually true/false?
- two squares per correct or incorrect decision (type I or type II error)