Hypothesis Testing Flashcards

1
Q

What is the null hypothesis?

A

There is no statistical significant difference between specified population, any observed difference is due to sampling or experimental error (chance).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What do we ask when we conduct a hypothesis test?

A

1) is the value of the test statistic extreme enough for us to reject the null hypothesis?
2) what is our test statistic
3) what should the distribution of the test statistic be if the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is the p-value?

A

The probability of obtaining a particular value of the test statistic or a more extreme value if the null hypothesis is true.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is alpha?

A

This is a pre determined level of significance which enables us to decide whether to reject the null hypothesis.

If the p-value is <= alpha we regard it as significant band reject the null hypothesis.

If the p-value is >= alpha we fail to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is a Type I error?

A

This is when the null hypothesis is actually true but we reject it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is a Type II error?

A

This is when the null hypothesis is false but we fail to reject it.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the probability of making a Type I error?

A

This is equal to the value of alpha (significance threshold)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Write a null hypothesis and a hypothesis for a one sample T-test?

A

Null hyp: the mean of this sample is equal to the hypothesised value

Hyp: The mean of this sample is different from the hypothesised value.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In a T-test in which the null hypothesis is true where should the value of t come from?

A

The value of t should come from a t- distribution with n-1 degrees of freedom

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

When would you use a Paired T-test?

A

When there is a pair of measurements per subject and you are measuring a before and after change.

(We still only have n independent data points even if we have 2n numbers)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What distribution is tested in a paired T-test

A

Difference between pairs of values are calculated and then distribution of differences is tested to see if it is equal to 0.

(It’s a one sample T-test where the hypothesised value is always zero)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What is a general null hypothesis and hypothesis for a 2 sample T-test.

A

Null hyp: the means of these two samples are the same

Hyp: the means of these two samples are different

(We are looking at the difference between two mean)

(Difference between two means also has a sampling distribution)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What does a 2 sample T-test assume?

A

1) The data is normally distributed

2) The variances of the two groups are equal

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

If the null hypothesis is true where should the t value come from?

1) For a single sample T-test
2) For a 2 sample T-test

A

The t-value should come from a t- distribution with n-2 degrees of freedom (where n is the total number of data points from both groups)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

1) When is a 2 sample T-test robust?

2) What are the exceptions?

A

1) When sample sizes of both groups are >= 30 (provided groups show equal variances)
2) if each group being compared have similar numbers in each group then it’s ok to use the test even when the standard deviations in the two groups differ by up to 3 fold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a non parametric substitute for the 2 sample T-test?

A

Mann Whitney Wilcoxon test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

State general null hypothesis and hypothesis for Analysis of Variance (ANOVA)

A

Null hyp: All of the means of these groups are the same

Hyp: at least one of these groups has a different mean from the others

18
Q

When is an Analysis of Variance (ANOVA) used?

A

Used to test the null hypothesis on a data set with 2 or more groups.

19
Q

What is the test statistic of an ANOVA?

A

F ratio

20
Q

What is the test statistic for T-tests?

A

T value

21
Q

What is the F ratio?

A

F= the treatment Mean square/ Error Mean square

Therefore a ratio between 2 variances

22
Q

What is the Treatment Mean square?

A

This is the explained variation (signal)

23
Q

What is the Error mean square?

A

This is the unexplained variation (noise)

24
Q

What is the total sum of squares (in ANOVA)?

A

This is the treatment sum of squares + the error sum of squares

25
Q

What is the total degrees of freedom (in ANOVA).

A

Treatment degrees of freedom + error degrees of freedom

26
Q

What determines which f distribution to use for the null distribution?

A

Total degrees of freedom

27
Q

How are mean squares constructed?

A

These are constructed by dividing the sum of squares by the degrees of freedom.

28
Q

How do you calculate the treatment mean of square?

A

Treatment sum of squares/Treatment degrees of freedom

29
Q

How do you calculate the Error mean square?

A

Error sum of squares/Error degrees of freedom

30
Q

What are F ratios defined by?

A

1) Whichever number of degrees of freedom are associated with top of F ratio fraction.
2) whichever number are associated with the bottom of the F ratio

31
Q

1) When aren’t F ratios significant?

A

F ratios aren’t significant when they are <= 1

An f ratio of 1 means the amount of noise is equal to the amount of variation

32
Q

What are Chi squared tests used for?

A

Chi squared is used to check interactions between count data.

You test whether the distribution of counts for different levels of one factor are consistent with/ have an effect across levels of another factor

33
Q

1) What are the assumptions of a Chi squared test?

2) what is generally accepted?

A

1) The expected counts must be >= 5

2) If counts aren’t >= 5 it is generally accepted as long as all the expected counts are >= 1 and 80% of them are >= 5.

34
Q

Why is >= 5 important in Chi squared?

A

Chi squared is constructed to make the assumption that the data follows a normal distribution and that can only be done for a Poisson distribution (count data) if the mean is at least 5.

35
Q

What is a common correction that is used when testing multiple hypotheses on the same data?

A

Bonferroni correction

36
Q

What is the Bonferroni correction?

A

If you have an overall significance threshold of alpha and you carry out m hypothesis tests. To reject the null hypothesis for any of the tests the p value must be <= a/m

37
Q

What are the assumptions of ANOVA?

A

1) Assumes residuals are normally distributed.

2) Assumes variance big unexplained variation is constant throughout the data set ( homogeneity of residuals).

38
Q

How would you correct for right skew?

A

1) Root data

Or

2) log data

39
Q

How can you correct for left skew?

A

1) squaring data

40
Q

What are parametric tests?

A

Tests that rely on pre defined distributions e.g normal distribution

41
Q

What are non parametric tests?

A

Tests that do not rely on specific pre defined distributions

However they still make some assumptions

42
Q

What is a general null hypothesis for the Mann Whitney Wilcoxon Test?

A

Two groups being compared come from the same distribution, with the same median but it does not matter what the shape of the distribution is?

Therefore the 2 groups being compared must have similar skews.