Data analysis introduction Flashcards
What is biostatistics?
what are some problems of it?
biostatistics is the application of mathematical statistics to problems of data analysis and experiment design.
It is not
- any help in choosing what to study
- a silver bullet for poorly designed experiments
- any insurance from drawing the wrong conclusion
- Types of problems*
- -descriptive statistics*
- inference about populations from samples
- testing hypothesis (i.e. do samples come from the same population?
What are the two areas of statistics?
Descriptive statistics
- The first step
- Reduce data to some form of summary to work with
- Include graphs, tables and summary statistics (i.e. standard deviations)
Inferential statistics
- attempts to draw conclusions from the data
- how reliable are the results?
- how probable is it that the difference are real or due to chance?
- hypothesis testing
Define the three different kinds of hypothesis testing
Research hypothesis
- a statement of belief which is to be the subject of the investigation
- e.g. grazing by limpets influences algal biomass
Alternative hypothesis
- A formal statement of the research hypothesis
- e.g. algal biomass will differ in areas with (+) and without (-) limpets
Null hypothesis -> no change
- a formal statement of no difference
- e.g. algal biomass in areas with and without limpets will be the same
How are appropriate statistical tests selected?
- depends on the function of experimental design and what was measured
- distribution of the data
- how to design the experiment
What is the most common statistical test used?
Parametric test
- measurements on an interval scale (e.g. biomass)
- conventional test e.g. t-test regression analysis
- assumptions
data is normally distributed
homogeneity of variances (spread of data the same)
What test is used when the data is non-normal (e.g. scored, high variance)
Non-parametric test
- similar types of tests
- measurements made on rank scale (e.g. biomass scored visually as low, medium, or high)
- data can be transformed into a rank scale when it does not meet the parametric test assumptions
- decreased sensitivity (i.e. a loss of power to detect differences)
- barely used but only when parametric test criteria are not reached*
When is a null hypothesis (Ho) rejected?
Specify a level of significance (a) for rejection of Ho
- specified before the test is conducted
- usually set at 0.05 (5%) in most biological studies
- specifies the risk of rejecting Ho when it in fact is true
- never anything is proven!*
What is the p-value?
- All statistical tests produce an estimate of the probability (p-value) of observing results as extreme as the sample given that Ho is TRUE
- if the p-value < a then Ho is rejected and Ha is accepted
What are type one and type two errors? and how are they prevented?
Type I and type II errors
Reject Ho when it is false GOOD
Reject Ho when it is true BAD – Type I error
Accept Ho when it is true GOOD
Accept Ho when it is false BAD – Type II error
Probability of a type I error = a
Probability of a type II error = ß
Function of difference between sample means, sample size, a
Reduce error increase sample size
Reduces variability
Increases power
How is the difference between two means tested?
T-test
e. g. effects of limpets on algal biomass
- Ho: algal biomass will not differ in areas with and without limpets
- Ha: algal biomass will differ in areas with and without limpets
How is the difference between several means tested?
One-way ANOVA test
e. g. effects of limpet density on algal biomass
- Ho: algal biomass will not differ with limpet density
- Ha: algal biomass will differ with limpet density
tests Ho: all means equal
Partition the variance
-among treatment means
-within treatments (variation among replicates e.g. error)
When do we accept the Ho?
If our statistical analysis shows that the significance level is below the cut-off value (0.05), we reject the null hypothesis and accept the alternative hypothesis.
if F ~ 1 then Ho accepted
if F > 1 then most variation is in between treatments
What is crucial when using a one-way ANOVA test?
The design must be balanced (e.g. equal number of replicates in each treatment)
What test is used when two independent factors are statistically compared? e.g. effect of limpet and chiton density on algal biomass
Two-way ANOVA
- Ho* tested:
- Limpet density has no effect on biomass
- Chiton density has no effect on biomass
- No interaction between chitons and limpets on biomass
interaction term:
-not significant the two factors act independently of each other
-significant then the factors are said to interact
cannot discuss main effects (limpets, chitons) when the interaction term is significant