Hypothesis Testing Flashcards

1
Q

In hypothesis testing, what do we always assume is true?

A

The null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is the null hypothesis?

A

This states there is no difference between the variables of interest

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What value is used to calculate the likelihood or probability that the difference observed happened by chance?

A

The p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What does a p-value of 0.02 signify?

A

That the probability your scenario happened by chance is only 2 in 100

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

When is the null hypothesis rejected?

A

When the p-value is below the significance threshold

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What does it mean if the p-value is large/above significance threshold?

A

You fail to reject your null hypothesis therefore no evidence exists for the difference - it is likely due to chance

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the commonly used cut-off for p-value? Why is this not a universal figure?

A

0.05

For studies such as GWAS, a much lower p-value is required

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is a type I error also known as and when does this happen?

A

This is a false positive and occurs when you reject the null hypothesis even though it is actually true

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What is the frequency of having a type I error/false positive?

A

This is the same as the value you use for significance cut off

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is a type II error also known as and when does it occur?

A

This is a false negative and occurs when you fail to reject the null hypothesis even though it is actually false

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is type II error or false negative dependent on?

A

Sample size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

The choice of statistical test used to determine your p-value depends on what three key factors?

A
  1. Study design (paired or independent)
  2. Outcome variable (continuous or categorical)
  3. Distribution (normal or non-normal)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

how is a t-statistic calculated?

A

For independent data, it is calculated by taking the observed mean difference and dividing this by the standard error of difference between the means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What three assumptions does a t-test make?

A
  1. Data is continuous
  2. Data is normally distributed
  3. Variance in the two groups is equal (levene’s test)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What does levene’s test do and why is it important?

A

Levene’s test helps assess whether the variance between two groups is equal. This is used when interpreting t-test results:

  • If levene’s test is >0.05 then we accept the null hypothesis and interpret the results relating to ‘equal variances assumed’
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What are your options if the assumptions for a parametric are untrue?

A
  1. Transform the data
  2. Check the normality again. If ok - use a parametric test
  3. If not ok, use a non-parametric test
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

What transformations can you attempt if your data is:

  • moderately positively skewed
  • strongly positively skewed
  • weakly positively skewed
A
  • log transform (logx)
  • reciprocal (1/x)
  • square root (rootx)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

What transformation method would you use if your data was:

  • moderately negatively skewed
  • strongly negatively skewed
  • unequal variation
A
  • square (x2)
  • cube (x3)
  • log/reciprocal/squareroot
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

What are the advantages of a non-parametric test? What are the disadvantages?

A
  • make no assumption about underlying distribution of data
  • less powerful than parametric
  • difficult to get CIs
20
Q

What is the non-parametric equivalent of a t-test?

A

Wilcoxon rank sum test or Mann-Whitney u test

21
Q

Describe how a wilcoxon rank sum test works

A
  • two independent groups: group1 and 2 where group1 is the smallest size group
  • rank all observations into ascending order
  • sum ranks for group 1 = test statistic T
  • look up T on wilcoxon rank sum table of critical values to get P-value
22
Q

What non-parametric is used for skewed data with more than two independent exposure groups? What is its parametric equivalent?

A

Kruskal-Wallis test

Parametric equivalent = ANOVA

23
Q

What test is used to compare two binary categorical variables and obtain a p-value?

A

Chi squared test

24
Q

What does the p-value of a chi-squared test tell us?

A

How likely the differences between our variables would have occurred by chance if there was truly no association

25
Q

How do you calculate the chi-square test statistic?

A

This involves working out how close the observed values in your table are to the values expected if there was no true association

You first have to work out the expected numbers for each cell of your table. General formula: (row total x column total)/overall total

The next step is to then calculate the chi-square statistic for each cell then total these together. General formula: (O-E)squared/E

26
Q

How do you interpret your chi-square statistic?

A

The larger the chi-square value, the less consistent the data are with the null hypothesis

Usually use a stats package to obtain a p-value but can use stats tables based in degrees of freedom

27
Q

How do you calculate your degrees of freedom?

A

Degrees of freedom = (Rows-1) x (Columns-1)

E.g. For a 2x2 table it would be (2-1) x (2-1) = 1 d.f

28
Q

Why do we we need both the Odds ratio AND the p-value?

A

The OR tells us the magnitude of an association whilst the p-value tells us the significance of this

29
Q

What are the assumptions of chi-squared?

A
  • Each subjects contributes data to only one cell (I.e. You can’t be a smoker AND a non-smoker)
  • The expected count in each cell should be at least 5 (SPSS will give you a warning)
30
Q

If your expected count are not all >5 in your table, should you use a chi-squared test? If not, what should you do instead?

A
  1. Yes - but you must then use the Yates Continuity Correction
  2. No - you should use Fishers Exact Test instead
31
Q

When using chi-squared test for tables bigger than 2x2, what could you do if you don’t meet the assumptions?

A

Combine the rows or columns with small numbers, if biologically plausible

32
Q

What is chi-squared test for trend?

A

This is a special test for when the exposure variable is ordered (not nominal) and the outcome is binary

For example, looking at disease (yes/no) across ordered age groups (

33
Q

How do you visualise correlation?

A

Scatter diagram

34
Q

On a scatter diagram, on which axis is the outcome plotted?

A

Vertical/y-axis

35
Q

What does correlation measure? What do the correlation coefficients (r) mean?

A

Measures the closeness or degree of association between two continuous variables

\+1 = perfect positive association
-1 = perfect negative association
0 = no association
36
Q

What are the two main types of correlation coefficient and when would you use each of them?

A

Pearsons correlation coefficient and Spearmans correlation coefficient

Pearsons is used when the variables are normally distributed. If they aren’t you can transform them and then use this, or instead you can use Spearmans

37
Q

What does correlation NOT take into account?

A

The gradient/steepness of slope

38
Q

What does the r-squared value do? Please give an example

A

This is the proportion of the variance of the outcome variable which is explained by the exposure variable

E.g. An r-squared of 0.64 for correlation between BP and stress means 64% of the variation in BP is explained by stress

39
Q

Does correlation equal causation?

A

No. Possibilities are:

  1. X influences or causes Y
  2. Y influences X
  3. Both X and Y are influenced by one or more other variables (confounders)
40
Q

Why should you interpret correlation results from large cohorts with caution?

A

Because you can get a significant result for a very weak correlation

41
Q

What is the difference between correlation and regression?

A

Correlation is used to assess if two variables are related and how closely

Regression is used to describe/model the rship or make predictions

42
Q

What does linear regression do?

A

States how much y (outcome) increases/decreases as X (exposure) increases

Estimates a best-fit straight line through the data

43
Q

What is the equation used in linear regression?

A

Y=a+bx

a=the intercept, i.e. the value of Y when X=0

b=the slope of the line that tells us on average how much Y increases/decreases for each unit increase in X. It is an estimate of the magnitude of effect

44
Q

What do the values of regression coefficient ‘b’ mean?

A

Positive B = outcome increases as exposure increases
Negative B = outcome decreases as exposure increases
B=0: outcome and exposure not related

45
Q

Consider two variables weight and systolic BP. Your values of a and b are 98.5 and 0.43 (95% CI 0.34-0.51), respectively. Your p-value is

A

Estimates that for every kg increase in weight, BP increased on average by 0.43 mmHg

You are 95% confident this increase is between 0.34 and 0.51 mmHg

This is highly statistically significant

46
Q

When is multiple linear regression used?

A

When you want to include two or more exposure variables

For example, to look at age and weight in reference to BP and obtain an age-adjusted regression coefficient