lecture 5 - statistical inference and the sign test Flashcards
probability
- Probability about events – such as
- head or tails
- card from deck
- marble from bag
- reaction time
- depression score
- Probability of an event
- Number possible occurrences of the event divided by the total number of all possible events
- Reported as proportions (or percentages %)
Range from 0 to 1 (or 0 to 100%)
basic probability rules
Notation for probability of an event happening is p(event)
* Addition Rule (A or B)
p(A or B) = p(A) + p(B)
IF – A & B are mutually exclusive (i.e. if A occurs, B cannot)
* Multiplication Rule (A and B) - probability of two things happening
p(A, B) = p(A) ´ p(B)
IF – A & B are independent (i.e. A occurring does not effect B occurring)
P(female) = 0.5
P(male) = 0.5
P(m or f) = 0.5 + 0.5 = 1
P(m & f) = 0.5 * 0.5 = 0.25
m&m = 0.25
m&f = 0.25
f&m = 0.25
f&f = 0.25
the sign test
you get the difference between the before scores and the after scores and write the difference in a new column and a plus or minus next to the value eg 12 - 15 the difference is +3. if the before value is larger than the after that’s when you get a - value.
binomial rule
p(x) = C^nx p^x q^(n-x)
significance level
you start with the assumption nothing happens = null hypothesis
the unlikely outcomes are in the tails of the distribution
if results lie in the unlikely region, reject the assumption that chance is the only thing going on.
set a criterion probability - statisticians often use p = 0.05 (5%) this is called the significance level
one and two tailed significance regions
two-tailed test - expected direction not known - common
one-tailed test - expected direction known without doubt - rare - directional hypothesis - public preregistration of predictions could/should increase this rarity
2-tailed sign test
M = no of +ve or -ve differences, whichever is smaller.
Call N the total number of non-zero differences
find critical value from table
limitations of the sign test
ignores size of difference
ignores zero difference between scores
assumes that subjects participate in both conditions - good design but not always possible
only assesses two conditions
Hypothesis testing defined NHST (null hypothesis significance testing)
- Null hypothesis (H0)
- The results occurred by chance
- Experimental hypothesis (H1)
- The results did not
- Determine possible outcomes according to H0
- Define a significance level
p = 0.05 (5%)
Reject H0 if results in the unlikely region defined by the significance level
hypothesis testing critical issues
- Could you reject H0 when it is true?
- YES: Type I error - 5% of time
- Can you ever prove H0 is true?
- Not with these methods (but “equivalence testing” does offer a way to “reject” the H1 – albeit not “prove” H0)
- CRITICALLY hypothesis testing only controls the proportion of times that we accept a result as significant when there is no actual underlying effect in the long run.
- A low p value for any individual experiment does not mean that this particular result is “real” or “reliable”.
- The p value is not the probability the result occurred “by chance” – it is a statement about how likely the results are WHEN H0 is true (but we don’t know if H0 is true). That is, p = P(E|H0).
- Bayesian stats offer a different approach (considering P(E|H0) & P(E|H1).
EXPLICITLY – mention importance of inference in science. NOT just p < 0.05 = happy! Issues of logic (what do tests actually do), design (is the comparison the right one – or are their confounds/limitation), and concept/theory (how does the experiment relate to the conceptual issue under investigation) are all vital to using/interpreting stats.
why P<0.05?
- There is no principled reason for using 0.05 (as opposed to say 0.04) as a criterion for what is “rare” enough to reject H0 (e.g. physics uses far, far, far lower values).
- Indeed RA Fisher (who is largely responsible for all this – along with J Neyman & ES Pearson) initially suggested 2SDs from the mean on a normal distribution was a reasonable criteria for what is “significant” enough to merit taking notice (this corresponds to 0.0456). He then just “rounded up” & we stuck with it ever since.
- But, remember the significance level determines the Type 1 error rate (i.e. mistakenly rejecting H0 when it is true).
- So, it is good practice to consider “conservative” significance levels (e.g. p < 0.01 or lower) if the consequences of a Type 1 error are serious (e.g. recommending a new drug that has serious side effects).
EXPLICITLY – mention importance of inference in science. NOT just p < 0.05 = happy! Issues of logic (what do tests actually do), design (is the comparison the right one – or are their confounds/limitation), and concept/theory (how does the experiment relate to the conceptual issue under investigation) are all vital to using/interpreting stats.
types of hypothesis
The hypothesis or prediction from your theory would normally be that an effect will be present. This hypothesis is called the alternative hypothesis and is denoted by H1. (It is sometimes also called the experimental hypothesis, but because this term relates to a specific type of methodology it’s probably best to use ‘alternative hypothesis’.) There is another type of hypothesis called the null hypothesis, which is denoted by H0. This hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent.
type I error
occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.
type II error
which occurs when we believe that there is no effect in the population when, in reality, there is. This would occur when we obtain a small test statistic
confidence intervals and statistical significance
if 95% confidence intervals didn’t overlap then we could conclude that the means come from different populations, and, therefore, that they are significantly different.
there is a relationship between statistical significance and confidence intervals. Cumming & Finch (2005) have three guidelines that are shown in Figure 2.16:
95% confidence intervals that just about touch end-to-end (as in the top left panel of Figure 2.16) represent a p-value for testing the null hypothesis of no differences of approximately 0.01.
If there is a gap between the upper end of one 95% confidence interval and the lower end of another (as in the top right panel of Figure 2.16), then p < 0.01.
A p-value of 0.05 is represented by moderate overlap between the bars (the bottom panels of Figure 2.16).