09 - Statistics and Reproducibility Flashcards
1
Q
Scientific studies and statistics
A
- correlation = statcl relationship, easy to prove
- causation = implies existing mechsm between A & B, very diff to prove
- observational study = large datasets, random pop, not necessarily hyp-driven, rarely causation
- controlled study = most studies, gps vs control, can lead to causation
2
Q
Comparing groups
A
- interclass correlation coeff ICC = var2(among gps)/(var2(among) + var2(w/i groups)) - Student's T-distribution = Gaussian w/ estimated mean T-test assume unbiased obs° + normal distrib - too much variables increase the number of significant ones
3
Q
Predicting and validating
A
- mapping inputs and outputs
- /!\ overfitting
- dataset divided in training + validation + testing
- qttv assessment
- – parameter sweep (make a param vary in reasonable bounds)
- – sensitivity S = ∆metric/∆param
4
Q
Reproducibility
A
- msrmt: change of results?
- analysis: another person another place gets the same results
5
Q
Unit testing
A
- indvdl units of source code are tested to assess if they are fit for use
- happy path/provoke
- test-driven programming (written before the code) or continuous integration (run everytime something has changed)
6
Q
Presenting the results
A
- visualization is important
- grammar and graphics