Week 6 Flashcards
Probabilities used to reject hypotheses are called
P values
A threshold level for the p level, called ___________ needs to be defined prior to the analysis. A usual choice is 0.05
Alpha level
-If p-value < 0.05, you ________ your hypothesis
reject
-If p-value > 0.05, you reject your hypothesis
accept
Null hypothesis
The null hypothesis (Ho) is a hypothesis against the research question, claiming that there is no difference in the result and the only differences observed are just noise/error
Research/alternative hypothesis
The research/alternative hypothesis (Ha) is the opposite to the null hypothesis claiming that there is a difference in the result
Type 1 error
False-positive
Reject the null hypothesis when it is true
The vaccine is not effective but you conclude it is effective
Type 2 error
False-negative
Not to reject the null hypothesis when it is false
The vaccine is effective, but you conclude it is not effective
3 types of binomial test
Observed proportion < expected proportion
Cumulative probability from 0 to observed
Observed proportion > expected proportion
1 - cumulative probability from observed to max
Observed proportion /=/ expected proportion
Two tailed cumulative probability same distance from the mean
Chi-square goodness of fit test
how the proportions in data fit to fixed (expected) proportions. Can test more than 2 categories
What is benfords law
Benford’s law (or first digit law). The frequency of first digits of naturally occuring numerical data (prices, populations) follow a particular proportion.
Using benfords law, which digit is the most common?
1 (30.1%), then 2 (17.6%), then 3 (12.5%, then 4, 5, 6….
If a set of data does not follow benfords law then…
It is likely that the data set is fabricated
How to report a chi goodness of fit test
x^2 (5) = 12.2, p=0.032
x^2 (df) = x^2 value, p = p value
Chi square test of association
Checking association between 2 nominal/ordinal variables
E.g. whether the proportion of tories/labours differe depending on the region of the UK. Then seeing whether these variables are associated