How not to lie with statistics Flashcards

1
Q

How not to lie overview

A

o Experimental design
o Data visualization – exploratory data analysis
o Statistical modelling
o Rejecting hypotheses
o Updating beliefs based on appropriate priors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Experimental design

A

Must effectively design an experiment to ensure that the conclusions are true.
-> reduce bias and sampling error (maximise precision and power)

Reduce sample error by:
- replication -> must avoid pseudo replication (artifical inflation)
- balance
- blocking

Reduce bias by:
- Blinding
- Control groups
- Randmization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Replication requires…
Replicates are not…

A

Replications require
o Independence
o an appropriate spatial scale

Replicates are not
o Replicates from each treatment grouped into blocks
o treatments repeated in different blocks
o repeated measures on the same individual sampling unit

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Visualisation and exploratory data analyses

A

Visualise and explore data before varrying out any analyses.

The objectives of exploratory data analysis are to:
- Suggest hypotheses about the causes of observed phenomena.
- Assess assumptions on which statistical inference will be based.
- Support the selection of appropriate statistical tools and techniques.
- Provide a basis for further data collection through experiments and/or observations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

John Tukey

A
  • Too much emphasis on statistical hypothesis testing before data is
    understood
  • Advocate exploratory data analysis using graphs and visualisations before statistical analyses are undertaken.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Modelling (simplification and overfitting)

A

Best model: Simplest model

(1) Formulate a maximal model that is commensurate with the design/data

e.g. height = constant x food intake x parents x siblings

(2) Fit model to data; evaluate hypotheses

(3) Remove non-significant terms to get to a simplified model -> test to ensure terms can be removed and not overfitting

e.g. height = constant x food intake x parents

(4) Critique the model – using residuals/outlier

(5) Consider transformations

(6) Repeat 2-4 to get to minimal adequate model (MAM)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Model simplification

A
  • Remove least significant term first/ highest order interactions
  • use model comparison test to check if leaving out the term causes a significant increase in SSE (error variance) - avoid overfitting/ underfitting
  • > If it does cause significant increase in SSE put term back into the model

( (SSER-SSEF)/ (DFR-DFF) ) / (SSEF/ DFF)

SSER= restricted model
SSEF= full model

Df for F when test significance are F(difference between the two dfs, the more complicated df) so F(dfR-dfF, dfF)

-> If interaction terms are significant, cannot remove any other main effectors and simplify further.

  • Keeping repeating to achieve a model that only contains significant terms
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Possible types of model

A

Full model
- a parameter for every data point
- model fits data perfectly
- no df or explantory power

Maximal model
- contains p factors, covariants and interactions of interest
- Terms may be insignificant
- N-P-1 df

Minimial adequate model
- fewer terms that are all significant
- N-P’-1 df and R^2 used to predict explanaotry power

Null model
- single paramter (grand mean of the response variable)
- No fit to the data
- N-1 df and no explanatory power

Remember when calculating df that P includes all parameters in the model + the intercept.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Parameter

A

Parameters = quantities describing populations eg. averages, proportions, measures of variation, measures of relationship

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Overfitting

A

Overfitted models fit the data better as have more parameters but have low predictive power

Overfitted = have lower df – as no parameters are predicted when making the model

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Accepting/ rejecting null hypothesis

A

must minimise type I and II error while maximising power

Type I error = rejecting a true null hypothesis (determined by α)

Type II error = failing to reject a false null hypothesis (represented by β)

Power of a test = probability that random sample leads to rejection of a false null hypothesis (1-β)

Power = 0-1 …. Closer to 1 = v good at detecting false null

Study = more powerful if …
* Sample size is large
* True discrepancy from null hypothesis is large
* Variability in the population is low

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Updating beliefs based on appropriate priors

A

Baye’s theorum
- Updating prior beliefs (prior probabilities) based on observed data to obtain posterior beliefs (posterior probabilities)
- Formula used to upated priors (e.g. p(A given B)

Base rate fallacy
- Base rate is the prior probability that something was true before the new evidence occurred
- If prior is incorrect then conclusion will be incorrect

Example: specificity (correct positive) and sensitivity (correct negative) -> must have knowledge on infection rate before applying new knowledge of test success.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly