DATA ANALYSIS 1 Flashcards by UNKNOWN T

What should a hypothesis include?

o What key endpoint, dependent variable, or parameter is being measured and compared between groups?
o What specific groups or experimental conditions are being compared to assess whether a difference exists?

How well did you know this?

Not at all

Perfectly

Way to test if the results have happened by chance

They won’t be repeatable

How well did you know this?

Not at all

Perfectly

Null hypothesis

states that there is no relationship between the two variable being studies and that any apparent effects are just due to random variation

How well did you know this?

Not at all

Perfectly

Alternative hypothesis

states there is a relationship between the two variables being studied and that any apparent effects are NOT due to random variation

How well did you know this?

Not at all

Perfectly

if the null hypothesis is true…

If the null hypothesis is false…

there is no real effect

the alternative hypothesis is true

there is a relationship
could be that experimental hypothesis is true
could be biases in experimental design
may be another reason

How well did you know this?

Not at all

Perfectly

E.g. the question: “what effect does treatment with drug A have on the probability that an individual suffers a stroke?”,

Create a null and alternative hypothesis

H0: There is no statistically significant difference between the percentage of individuals
who suffer a stroke following treatment with drug A compared to treatment with
placebo.

H1: There is a statistically significant difference between the percentage of individuals
who suffer a stroke following treatment with drug A compared to treatment with
placebo.

How well did you know this?

Not at all

Perfectly

To generate an appropriate hypothesis for a study, the researcher therefore needs to consider what meaningful difference, would provide evidence relevant to the question they are investigating. Specifically:

What  endpoint(s)  and  dependent  variables  are  the  researchers  interested  in comparing between groups (e.g. blood pressure, height, rate of reaction) 
• how will the endpoint(s) be expressed (e.g. mean, median, maximum, % etc.)? 
• comparison of which groups’ data is the hypothesis based on? (e.g. drug vs. placebo, males vs. females, treated vs. untreated)

How well did you know this?

Not at all

Perfectly

For the question to be investigated and a testable hypothesis generated, specific
details are required. For example:

• What research approach setting are they using (e.g. clinical study, in vivo,
cells)?
• What specific is meant or constitutes “an impact on inflammation”? When? In
what circumstances? How could it be measured? How would the data be
expressed?
• What varying groups or experimental conditions would the researchers need to demonstrate a meaningful difference between, in order to answer their
question? (e.g. between cells treated with PGD2 vs. cells treated with vehicle4)

How well did you know this?

Not at all

Perfectly

However, in order to draw meaningful conclusions, the researcher needs to evaluate whether the difference observed between the groups is:

(a) Meaningful – i.e. does the difference constitute an effect-size that is biologically or clinically important7
(b) Genuine – i.e. was the difference observed because the groups are genuinely different, or was it merely the result of natural variation within the data8.

How well did you know this?

Not at all

Perfectly

What do significance tests do?

compare the variation with the effect size

calculate an estimate of how often the effect size observed between the groups would occur due to natural/random variation within the data.

How well did you know this?

Not at all

Perfectly

What is the p-value?

probability that the effect size is due to random variation and there is no real effect

*tells you about the strength of your data

How well did you know this?

Not at all

Perfectly

p=0.001
statistically very strong data

p=0.04-0.02
statistically weak to moderately weak data

p=0.01
moderately strong data

p<0.01
statistically moderate to strong

p=0.05
statistically weak data

p>0.05
weak to very weak data

How well did you know this?

Not at all

Perfectly

If p=0.05, the probability that the null hypothesis is true is… and the probability that the alternative hypothesis is true is…

50% the null hypothesis is true

50% the alternative hypothesis is true

How well did you know this?

Not at all

Perfectly

Type 1 Error

Type 2 Error

rejecting the null hypothesis when in fact it is true and there is no real effect

accepting null hypothesis when in fact it is false and there is a real effect

How well did you know this?

Not at all

Perfectly

Why is null hypothesis significance testing so popular?

People don’t like uncertainty and the NHST appears to give a definite answer (however often wrong due to type 1 and 2 errors)

People don’t like making decisions, therefore the computer makes the decisions for them based on the p-value (however only makes decision off one piece of evidence, the p-value, when really you should make the decision based on all the evidence)

People are often lazy:

tells you a result is significant, implying that you don’t need to do more experiments (when there is still only weak evidence)
tells you result is not significant implying that you don’t need to do more experiments (when really the p-value by itself gives no direct evidence that the null hypothesis is true)
possible to use it without really understanding it

People are ambitious, and the NHST allows you to publish more papers for the minimum work (even if some of the conclusions are actually wrong)

How well did you know this?

Not at all

Perfectly

Effect-size

the difference observed in the relevant endpoint between the two groups. E.g. the mean value for one group compared to the mean value of the other group.

alpha value

p-value < α → the difference is statistically significant → null hypothesis is rejected

p-value ≥ α → the difference is not statistically significant → null hypothesis is retained

Interpret alpha values

o A smaller alpha (e.g. 0.001) is more likely to result in a false negative as very strong evidence of a statistically significant difference is required to reject the null hypothesis.

o A greater alpha (e.g. 0.05) is more likely to result in a false positive as only weak evidence of a statistically significant difference is required to reject the null hypothesis.

T test is used to…

Compare two population means

Paired t test

Measuring the same individuals under different conditions (paired data)
E.g. measuring the same individuals when they have taken a placebo and the actual drugs

Independent t test

Uses different groups of people (Unpaired data)

Uses one set of people on a placebo and another group of people on a drug

What does a t test assume?

Normal distribution

How to see if a data is normally distributed or not?

Plot a histogram

What happens if the data isn’t normally distributed?

Do a non parametric test
But this is the last resort
There are ways to normalise your data

Factors associated with relatively large p-values

↓ difference between group means (x̄1 - x̄2) ↑ standard deviation (s) ↓ sample size (n)

Factors associated with relatively small p-values

↑ difference between group means (x̄1 - x̄2) ↓ standard deviation (s) ↑ sample size (n)

When to use a two tailed/one tailed test

o A two-tailed text should be used when the hypothesis refers to X being different to Y o A one-tailed test should be used when the hypothesis refers to X being greater than/less than Y.

Difference between parametric and non-parametric datai. the data (when expressed as a histogram) can be visually inspected to subjectively judge whether the shape is approximately normal, or whether there is substantial kurtosis (flattening/peaking) or skewing (asymmetry). a test of normality can be used. For example, the Shapiro-Wilks test evaluates the null hypothesis that the data is not normally distributed. If the p-value calculated is less than the chosen alpha (threshold value, e.g. 0.05) there is sufficient evidence to indicate that the data is non-parametric, otherwise the data is assumed to be parametric due to there being insufficient evidence to reject the null hypothesis.

Difference between independent and dependent smaples

o Samples are independent if each group contains different samples; each experimental condition involves a different set of participants, animals, cells, etc. E.g. one group of participants is treated with a drug, a different group of participants is treated with placebo. o Samples are dependent if each group contains the same samples; each experimental condition involves the same set of participants, animals, cells etc. (e.g. the same group of participants have their blood pressure measured before and after taking a drug).

When to use each statistical test?