Data Analysis I Flashcards

Question 1

Q

Give 3 factors that determine how much weight we should place on the results to a particular study.

Answer

A

How well characterised are the reagents/equipment/methods?
How well is the experiment designed - good controls, possible biases?
How many times has the result been succesfully reproduced? (statistical confidence)

Question 2

Q

What is the main reason for calculating a mean of multiple observations?

Answer

A

Random errors tend to cancel each other out when a mean is taken
Any effect we observe is less likely to be random error and more likely to be real

Question 3

Q

Describe 3 approaches used to assess the reliability of experimental results.

Answer

A

Visual approach - s.e.m. error bars
Numerical approach - p values and statistical significance
Quantitative approach - confidence intervals

Question 4

Q

In clinical trials, what is the current preferred approach for assessing whether an effect seen in a study is real or due to chance?

Answer

A

Confidence intervals

Question 5

Q

Explain the meaning of the 0-1 p-value scale.

Answer

A

1 means the data looks random
0 means the data does not look random

Question 6

Q

Briefly outline how we should interpret p-values of above and below 0.05.

Answer

A

P > 0.05 - result is statistically unreliable
P = 0.01-0.05 - effect is worth considering but may still be due to random chance
P = 0.01 or less - fairly convincing effect - it seems unlikely that this is due to random error

Question 7

Q

What is the disadvantage of using p-values to assess the significance of an effect?

Answer

A

Doesn’t tell you effect size
Doesn’t tell you confidence intervals

P-values should only be used in combination with visual data and/or confidence intervals

Question 8

Q

Give functional definitions for “statistically significant” and “not statistically significant”.

Answer

A

Statistically significant - worth considering; more likely to be a real effect than random error
Not statistically significant - result is statistically unreliable; it may be real, but may well just be due to random error

Statistically significant does not mean biologically significant!

Question 9

Q

Biological significance is determined by what?

Answer

A

The size of the effect, i.e. the difference between the experiment and the control.

This is quantified by confidence intervals and can be seen in error bars.

Question 10

Q

Give the advantages and disadvantages of using statistical significance.

Answer

A

Advantages:

Useful in quality control applications

Disadvantages:

Very misleading
Statistically signficant results may still be random noise - null hypothesis is correct
Effects that are not statistically significant may still be real
Statistical significance does not prove biological significance

Question 11

Q

Which 2 tests are most commonly used to determine whether the difference between means is due to random error?

Answer

A

Student’s t-test, Mann Whitney U-test

Question 12

Q

When is it appropriate to use the student’s t-test?

Answer

A

When the data is normally distributed. Slight variations from normal distribution are not a problem, but highly skewed data should be assessed by a different test.

Question 13

Q

When should the Mann Whitney U-test be used?

Answer

A

If the data definitely isn’t normally distributed. This test is “non-parametric”.