lecture 5 - statistical inference and the sign test Flashcards

1
Q

probability

A
  • Probability about events – such as
    • head or tails
    • card from deck
    • marble from bag
    • reaction time
    • depression score
  • Probability of an event
    • Number possible occurrences of the event divided by the total number of all possible events
  • Reported as proportions (or percentages %)
    Range from 0 to 1 (or 0 to 100%)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

basic probability rules

A

Notation for probability of an event happening is p(event)
* Addition Rule (A or B)
p(A or B) = p(A) + p(B)
IF – A & B are mutually exclusive (i.e. if A occurs, B cannot)
* Multiplication Rule (A and B) - probability of two things happening
p(A, B) = p(A) ´ p(B)
IF – A & B are independent (i.e. A occurring does not effect B occurring)
P(female) = 0.5
P(male) = 0.5
P(m or f) = 0.5 + 0.5 = 1
P(m & f) = 0.5 * 0.5 = 0.25
m&m = 0.25
m&f = 0.25
f&m = 0.25
f&f = 0.25

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

the sign test

A

you get the difference between the before scores and the after scores and write the difference in a new column and a plus or minus next to the value eg 12 - 15 the difference is +3. if the before value is larger than the after that’s when you get a - value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

binomial rule

A

p(x) = C^nx p^x q^(n-x)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

significance level

A

you start with the assumption nothing happens = null hypothesis
the unlikely outcomes are in the tails of the distribution
if results lie in the unlikely region, reject the assumption that chance is the only thing going on.
set a criterion probability - statisticians often use p = 0.05 (5%) this is called the significance level

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

one and two tailed significance regions

A

two-tailed test - expected direction not known - common

one-tailed test - expected direction known without doubt - rare - directional hypothesis - public preregistration of predictions could/should increase this rarity

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

2-tailed sign test

A

M = no of +ve or -ve differences, whichever is smaller.

Call N the total number of non-zero differences

find critical value from table

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

limitations of the sign test

A

ignores size of difference
ignores zero difference between scores
assumes that subjects participate in both conditions - good design but not always possible
only assesses two conditions

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Hypothesis testing defined NHST (null hypothesis significance testing)

A
  • Null hypothesis (H0)
    • The results occurred by chance
  • Experimental hypothesis (H1)
    • The results did not
  • Determine possible outcomes according to H0
  • Define a significance level
    p = 0.05 (5%)
    Reject H0 if results in the unlikely region defined by the significance level
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

hypothesis testing critical issues

A
  • Could you reject H0 when it is true?
    • YES: Type I error - 5% of time
  • Can you ever prove H0 is true?
    • Not with these methods (but “equivalence testing” does offer a way to “reject” the H1 – albeit not “prove” H0)
  • CRITICALLY hypothesis testing only controls the proportion of times that we accept a result as significant when there is no actual underlying effect in the long run.
    • A low p value for any individual experiment does not mean that this particular result is “real” or “reliable”.
    • The p value is not the probability the result occurred “by chance” – it is a statement about how likely the results are WHEN H0 is true (but we don’t know if H0 is true). That is, p = P(E|H0).
    • Bayesian stats offer a different approach (considering P(E|H0) & P(E|H1).
      EXPLICITLY – mention importance of inference in science. NOT just p < 0.05 = happy! Issues of logic (what do tests actually do), design (is the comparison the right one – or are their confounds/limitation), and concept/theory (how does the experiment relate to the conceptual issue under investigation) are all vital to using/interpreting stats.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

why P<0.05?

A
  • There is no principled reason for using 0.05 (as opposed to say 0.04) as a criterion for what is “rare” enough to reject H0 (e.g. physics uses far, far, far lower values).
    • Indeed RA Fisher (who is largely responsible for all this – along with J Neyman & ES Pearson) initially suggested 2SDs from the mean on a normal distribution was a reasonable criteria for what is “significant” enough to merit taking notice (this corresponds to 0.0456). He then just “rounded up” & we stuck with it ever since.
  • But, remember the significance level determines the Type 1 error rate (i.e. mistakenly rejecting H0 when it is true).
  • So, it is good practice to consider “conservative” significance levels (e.g. p < 0.01 or lower) if the consequences of a Type 1 error are serious (e.g. recommending a new drug that has serious side effects).
    EXPLICITLY – mention importance of inference in science. NOT just p < 0.05 = happy! Issues of logic (what do tests actually do), design (is the comparison the right one – or are their confounds/limitation), and concept/theory (how does the experiment relate to the conceptual issue under investigation) are all vital to using/interpreting stats.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

types of hypothesis

A

The hypothesis or prediction from your theory would normally be that an effect will be present. This hypothesis is called the alternative hypothesis and is denoted by H1. (It is sometimes also called the experimental hypothesis, but because this term relates to a specific type of methodology it’s probably best to use ‘alternative hypothesis’.) There is another type of hypothesis called the null hypothesis, which is denoted by H0. This hypothesis is the opposite of the alternative hypothesis and so usually states that an effect is absent.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

type I error

A

occurs when we believe that there is a genuine effect in our population, when in fact there isn’t.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

type II error

A

which occurs when we believe that there is no effect in the population when, in reality, there is. This would occur when we obtain a small test statistic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

confidence intervals and statistical significance

A

if 95% confidence intervals didn’t overlap then we could conclude that the means come from different populations, and, therefore, that they are significantly different.

there is a relationship between statistical significance and confidence intervals. Cumming & Finch (2005) have three guidelines that are shown in Figure 2.16:

95% confidence intervals that just about touch end-to-end (as in the top left panel of Figure 2.16) represent a p-value for testing the null hypothesis of no differences of approximately 0.01.
If there is a gap between the upper end of one 95% confidence interval and the lower end of another (as in the top right panel of Figure 2.16), then p < 0.01.
A p-value of 0.05 is represented by moderate overlap between the bars (the bottom panels of Figure 2.16).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

average margin of error (MOE)

A

The MOE is half the length of the confidence interval (assuming it is symmetric), so it’s the length of the bar sticking out in one direction from the mean.

17
Q

sample size and statistical significance

A

power is intrinsically linked with the sample size. Given that power is the ability of a test to find an effect that genuinely exists, and we ‘find’ an effect by having a statistically significant result (i.e., p < 0.05), there is also a connection between the sample size and the p-value associated with a test statistic. the same effect will have different p-values in different-sized samples, small differences can be deemed ‘significant’ in large samples, and large effects might be deemed ‘non-significant’ in small samples. The power of a statistical test is the probability that it will find an effect when one exists.

18
Q

how confidence intervals are reported

A

in square brackets

19
Q

misconceptions about statistical significance

A

1 - A significant result means that the effect is important = Statistical significance is not the same thing as importance because the p-value from which we determine significance is affected by sample size. very small and unimportant effects will be statistically significant if sufficiently large amounts of data are collected (Figure 2.18), and very large and important effects will be missed if the sample size is too small.

2 - A non-significant result means that the null hypothesis is true. If the p-value is greater than 0.05 then you could decide to reject the alternative hypothesis,3 but this is not the same as the null hypothesis being true. A non-significant result tells us only that the effect is not big enough to be found (given our sample size), it doesn’t tell us that the effect size is zero.

3 - A significant result means that the null hypothesis is false = A significant test statistic is based on probabilistic reasoning, which limits what we can conclude. Cohen (1994) points out that formal reasoning relies on an initial statement of fact, followed by a statement about the current state of affairs, and an inferred conclusion.

although NHST is the result of trying to find a system that can test which of two competing hypotheses (the null or the alternative) is likely to be correct, it fails because the significance of the test provides no evidence about either hypothesis.

20
Q

biggest problems of NHST

A
  • it encourages all-or-nothing thinking: if p < 0.05 then an effect is significant, but if p > 0.05, it is not.
  • Another problem is that the conclusions from NHST depend on what the researcher intended to do before collecting data.
    -he consequences of the misconceptions of NHST are that scientists overestimate the importance of their effects (misconception 1); ignore effects that they falsely believe don’t exist because of ‘accepting the null’ (misconception 2); and pursue effects that they falsely believe exist because of ‘rejecting the null’ (misconception 3). Given that a lot of science is directed at informing policy and practice, the practical implications could be things like developing treatments that, in reality, have trivial efficacy, or not developing ones that have potential. NHST also plays a role in some wider issues in science.
  • Science should be objective, and it should be driven, above all else, by a desire to find out truths about the world. It should not be self-serving, at least not if that gets in the way of the truth. Unfortunately, scientists compete for scarce resources to do their work: research funding, jobs, lab space, participant time and so on. It is easier to get these scarce resources if you are ‘successful’, and being ‘successful’ is tied up with NHST