Week 6 Flashcards
Probabilities used to reject hypotheses are called
P values
A threshold level for the p level, called ___________ needs to be defined prior to the analysis. A usual choice is 0.05
Alpha level
-If p-value < 0.05, you ________ your hypothesis
reject
-If p-value > 0.05, you reject your hypothesis
accept
Null hypothesis
The null hypothesis (Ho) is a hypothesis against the research question, claiming that there is no difference in the result and the only differences observed are just noise/error
Research/alternative hypothesis
The research/alternative hypothesis (Ha) is the opposite to the null hypothesis claiming that there is a difference in the result
Type 1 error
False-positive
Reject the null hypothesis when it is true
The vaccine is not effective but you conclude it is effective
Type 2 error
False-negative
Not to reject the null hypothesis when it is false
The vaccine is effective, but you conclude it is not effective
3 types of binomial test
Observed proportion < expected proportion
Cumulative probability from 0 to observed
Observed proportion > expected proportion
1 - cumulative probability from observed to max
Observed proportion /=/ expected proportion
Two tailed cumulative probability same distance from the mean
Chi-square goodness of fit test
how the proportions in data fit to fixed (expected) proportions. Can test more than 2 categories
What is benfords law
Benford’s law (or first digit law). The frequency of first digits of naturally occuring numerical data (prices, populations) follow a particular proportion.
Using benfords law, which digit is the most common?
1 (30.1%), then 2 (17.6%), then 3 (12.5%, then 4, 5, 6….
If a set of data does not follow benfords law then…
It is likely that the data set is fabricated
How to report a chi goodness of fit test
x^2 (5) = 12.2, p=0.032
x^2 (df) = x^2 value, p = p value
Chi square test of association
Checking association between 2 nominal/ordinal variables
E.g. whether the proportion of tories/labours differe depending on the region of the UK. Then seeing whether these variables are associated
How to report a chi square test of association
X^2 (2, N = 27) = 1.43, p = 0.490
x^2 (degree of freedom, N = total value) = x^2 value, p = p value
Paired samples - McNemars test
Paired samples mean that, data points are paired across two groups
E.g. whether the same subject needs to participate before and after an intervention or whether the 2 sets of data are related
t tests
Difference in group of measures (interval or ratio variables)
Measurable variables e.g. what is your 100m PB, what is your vo2 max instead of categorical data like do you prefer red or black cars.
3 types of t-test, each corresponds to the test for nominal/ordinal variables that we already learned.
- One sample t-test – binomial or chi-square goodness of fit (expected measure – testing whether data is different – one group)
- Indepdent (unpaired) samples t-test – chi square test of association – comapring the mean of 2 sets of data
- Paired samples t test – Mcnemars test
One sample t test
Compares the mean of one sample group against a fixed value
Whether the mean of your data is different from the fixed values’ mean
Independent samples t test
Compares the observed difference between the means of two indepdednet samples or categories
Which t test is this:
comparing weight before and after covid lockdown
Paired sample t test as same subject, compared before and after
Which t test is this:
Compraing body temperatures of vaccine and placebo group one hour after inoculation
Independent samples t test as vaccine and placebo group are different groups of people
Which t test is this:
Comparing time spent with children between married couples
Paired sample t test and children and married couples are related
Normality
sampling distribution of the mean is normal – if you take groups of n samples from the distribution and calculate the means of each sample group, those means are normally distributed. This holds when the sample size n is large.
Test of normality
Test of normality – normality assumption can be checked using another statistical test: violation of the normality indicated by low p value. (I.e. p < 0.05). If P value lower than alpha value (0.05), then normality violated, and you have to do a non-parametric test.
Which theory describes normality?
Central limit theorem
Parametric tests
Statistical tests based on the normality assumption are called parametric tests
Independent samples t test are run when
Independent samples t test are run when equality of variance are equal (variance of two populations are equal). Variance = sd.
Which test do we use to test the equality of variance
Levenes test