Data Analysis Flashcards

1
Q

Why should you graph original data and not averages?

A

You can see for yourself where’s it the average is
You can see for yourself what the spread looks like
You can see if there is anything odd/interestiing about the data eg outliers, evidence of skewing etc

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How can you find true confiemdence intervals?

A

Using a T test

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Why do many sets of experimental observations for approximately to a normal distribution?

A

The mathematical explanation is the central limit therom
Why/what data may result in a normal distribution
If a variable is affected by a lot of different random factors
Each has a small effect
The effects are additive
The distribution will approximate to a normal distribution

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Sd

A

A statistical tool that tells you how tightly (close together) all the variations examples are clustered around the mean in a set of data
It is a more sophisticated indicator of the precision of a set of given measurements

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What does it mean if effect size is not large enough compared to variation?

A

It means that random variations in readings might account for the difference between treated and controlled

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What do statistical tests such as T tests do?

A

They ask how does variation compare with effect size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Why should error bars be treated as a rough indication?

A

With low density samples, they underestimate the uncertainty

Error bars are not additive

All statistics only give a rough indication of confidence

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Why is it wrong to think that if error bars don’t overlap the result is significant and Vice Versa

A

Biological significance is best shown by effect size, not by statistical significance

The idea that error bars just touching is equivalent to p=0.05 does not apply to small sample numbers

Idea is just silly

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Confidence intervals

A

These get round the problem that SEM error bars are not additive.

By using a t test, we can get the best estimate, even when we have a small number of observations

The T test can calculate the 95% C.I for the difference between the means

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

What is 95% C.I

A

C.I interval gives us an idea of how much larger or smaller and tells us that 95% of time the real answer should be within a his interval.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What does a small p value <0.05 mean?

A

It indicates strong evidence against the null hypothesis, so you reject null hypothesis. This means that the probablity of the data due to only chance/random variation is less than 5%

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What does a large p value >0.05 indicate ?

A

It indicates weak evidence against the null hypothesis so you fail to reject the null hypothesis

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Advantages and disadvantages of confidence intervals

A

Adv;
-combine numerical information in effect size, statistical confidence and possible variation in the ‘real’ effect size
-ideal for simple comparisons such as treated vs control
-now the preferred approach in clinical research and epidemiology
Diaadv:
Harder to apply to more complex experiments, eg more than one control, more than one treatment

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

How can the effect size be compared with the variation? For statistical reliability

A

a) by eye from the raw data
b) using sem error bars
c) more precisely using a 95% confidence interval

CI is more precise than error bars and more informative than a p value

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

For sample sizes above 20 how do we treat the confidence intervals

A

We treat them as weak to moderate as long as the error bars don’t overlap

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Sem small compared to effect size

A

The real answer may be a little larger or smaller than our estimate, but not much

Strong statistical confidence

17
Q

Sems are moderate compared to effect size

A

The real answer may be quite a bit larger or smaller than our Estimate, but it is unlikely to be zero

Moderate statistical confidence

If sample size over 20, then double the larger error bar and it doesn’t overlap with the other one

18
Q

Sems are substantial compared to effect size

A

The real answer may be quite a lot larger or smaller than our own estimate. It could even be negative or zero

Weak statistical confidence

The error bars close but not overlapping

19
Q

Sems are large compared to effect size

A

The real answer may be a lot larger or smaller than our estimate. It could even be zero or negative

Weak to very weak statistical confidence

Error bars overlapping

20
Q

What do significance tests do?

A

They mathematically compare the variation with the effect size. They calculate a number called the p value. This is the probability of checking whether effect size is due to random variation or not

21
Q
P value strength 
P-0.001
P-0.01 
P-0.05
P>0.05
A

P-0.001 statistically very strong data
P-0.01 moderately strong data
P-0.05 statistically weak data
P>0.05 weak to very weak data

22
Q

Type 1 error vs type 2 error

A

Type 1 error is rejecting the null hypothesis, stating that the effect is significant when in fact there is no real effect

Type 2 error is accepting H0 and declaring that a result is not statistically significant, when in fact there is a real effect.

23
Q

Give 3 reasons why null hypothesis significance testing has proved extremely popular. For each reason, explain why this may cause problems with the correct interpretations of the data

A

People don’t like uncertainty and nhst appears to give a definite answer. This can lead to type 1 and type 2 errors because it is often the wrong answer

People don’t like making decisions so nhst let’s computer make decision. The problem is you should make the decision based on all the evidence

Ambitious-nhst allows you to publish more papers even if some conclusions are actually wrong

Often lazy- tells you a result is significant which implies that you don’t need to do more experiments, problem can arise when there is only weak evidence

24
Q

When do we use a students T test

A

When we are studying the difference between two groups of observations

  • > measurements on treated and control samples
  • > measurements on patients and normal controls

It is designed for cases where you have small numbers and observations
And you cannot tell the actual distribution of the data. It assumes a normal distribution

It compares effect size with variation and uses this to calculate a 95% confidence interval and a p value for null hypothesis

A one tailed T test gives a smaller value.

25
Q

When do you use a paired t test

A

By looking st before and after results in single individuals, we can ignore the variation between the individuals and study the effect and the variability of the effect itself s
Paired data can be analysed by a paired t test

26
Q

Why are non parametric tests not always useful

A

Non parametric tests don’t work well with small numbers of observations. They do not provide a 95% CI

27
Q

How to draw a confidence interval

A

Draw x axis with appropriate scale
Draw a line at effect size to represent the mean
The CI is saying that our mean could be as small or big as []
Draw lines at these points
Then join the lines together

28
Q

When is linear regression used

A

Linear regression is used when one variable is precisely fixed in advance eg time, dose etc

Correlation is used when both variables are measurements which may have random errors or random variation

29
Q

How many litres in one decilitre

A

0.1L in 1 dl

30
Q

What can you do to reduce probability of making type 1 error

A

Reduce significance level from 0.05 to 0.01