Data analysis introduction Flashcards

1
Q

What is biostatistics?

what are some problems of it?

A

biostatistics is the application of mathematical statistics to problems of data analysis and experiment design.

It is not

  • any help in choosing what to study
  • a silver bullet for poorly designed experiments
  • any insurance from drawing the wrong conclusion
  • Types of problems*
  • -descriptive statistics*
  • inference about populations from samples
  • testing hypothesis (i.e. do samples come from the same population?
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are the two areas of statistics?

A

Descriptive statistics

  • The first step
  • Reduce data to some form of summary to work with
  • Include graphs, tables and summary statistics (i.e. standard deviations)

Inferential statistics

  • attempts to draw conclusions from the data
  • how reliable are the results?
  • how probable is it that the difference are real or due to chance?
  • hypothesis testing
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Define the three different kinds of hypothesis testing

A

Research hypothesis

  • a statement of belief which is to be the subject of the investigation
  • e.g. grazing by limpets influences algal biomass

Alternative hypothesis

  • A formal statement of the research hypothesis
  • e.g. algal biomass will differ in areas with (+) and without (-) limpets

Null hypothesis -> no change

  • a formal statement of no difference
  • e.g. algal biomass in areas with and without limpets will be the same
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

How are appropriate statistical tests selected?

A
  • depends on the function of experimental design and what was measured
  • distribution of the data
  • how to design the experiment
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is the most common statistical test used?

A

Parametric test

  • measurements on an interval scale (e.g. biomass)
  • conventional test e.g. t-test regression analysis
  • assumptions

data is normally distributed

homogeneity of variances (spread of data the same)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What test is used when the data is non-normal (e.g. scored, high variance)

A

Non-parametric test

  • similar types of tests
  • measurements made on rank scale (e.g. biomass scored visually as low, medium, or high)
  • data can be transformed into a rank scale when it does not meet the parametric test assumptions
  • decreased sensitivity (i.e. a loss of power to detect differences)
  • barely used but only when parametric test criteria are not reached*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

When is a null hypothesis (Ho) rejected?

A

Specify a level of significance (a) for rejection of Ho

  • specified before the test is conducted
  • usually set at 0.05 (5%) in most biological studies
  • specifies the risk of rejecting Ho when it in fact is true
  • never anything is proven!*
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

What is the p-value?

A
  • All statistical tests produce an estimate of the probability (p-value) of observing results as extreme as the sample given that Ho is TRUE
  • if the p-value < a then Ho is rejected and Ha is accepted
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are type one and type two errors? and how are they prevented?

A

Type I and type II errors

Reject Ho when it is false GOOD

Reject Ho when it is true BAD – Type I error

Accept Ho when it is true GOOD

Accept Ho when it is false BAD – Type II error

Probability of a type I error = a

Probability of a type II error = ß

Function of difference between sample means, sample size, a

Reduce error increase sample size

Reduces variability

Increases power

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How is the difference between two means tested?

A

T-test

e. g. effects of limpets on algal biomass
- Ho: algal biomass will not differ in areas with and without limpets
- Ha: algal biomass will differ in areas with and without limpets

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

How is the difference between several means tested?

A

One-way ANOVA test

e. g. effects of limpet density on algal biomass
- Ho: algal biomass will not differ with limpet density
- Ha: algal biomass will differ with limpet density

tests Ho: all means equal

Partition the variance

-among treatment means

-within treatments (variation among replicates e.g. error)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

When do we accept the Ho?

A

If our statistical analysis shows that the significance level is below the cut-off value (0.05), we reject the null hypothesis and accept the alternative hypothesis.

if F ~ 1 then Ho accepted

if F > 1 then most variation is in between treatments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What is crucial when using a one-way ANOVA test?

A

The design must be balanced (e.g. equal number of replicates in each treatment)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What test is used when two independent factors are statistically compared? e.g. effect of limpet and chiton density on algal biomass

A

Two-way ANOVA

  • Ho* tested:
  • Limpet density has no effect on biomass
  • Chiton density has no effect on biomass
  • No interaction between chitons and limpets on biomass

interaction term:

-not significant the two factors act independently of each other

-significant then the factors are said to interact

cannot discuss main effects (limpets, chitons) when the interaction term is significant

How well did you know this?
1
Not at all
2
3
4
5
Perfectly