Data Analysis in Real Life Flashcards
What are Some methods for helping reports have a more clear narrative include
- eliminating jargon and focusing on interpretability
- critiquing significant effects by coming up with potential alternate explanations
- focusing on simpler models and parsimony
What is Version control software used for
Keeps track of checked in versions of code, data and reports
What are Some easy things to double check reports include
- verifying the signs of effect are in the obvious direction
- checking magnitude of effects by comparison with other known effects
- putting units on graphs and coefficients and generally keeping track of units
What do Reproducible report writing tools like knitr and ipython help with
- by automating the report writing process
- by organizing ones thinking by blending the code and the narrative into a single document
- by documenting the analysis code with the project narrative
- by advancing the goal of reproducibility
What are two components that make for good final data products that are ubiquitous across all settings
making the report reproducible and
making the report and code version controlled.
The reason you get a null result it may be due to
low power
that the null hypothesis is actually correct
A study with a very low sample size will likely have
low power
Calculating power after the study has been done and analyzed is
problematic and should only be done by people well versed in the issues
Ideally, a surrogate variable for variable of interest will
- have a known or estimable variance around the desired measurement variable
- be unbiased
What is the idea of power
Power is the probability of rejecting the null hypothesis when it’s false.
What can you do In the absence of any calibration data to evaluate your surrogate
either modeling via assumptions or sensitivity analyses.
What to do if your surrogate variable is such an unreliable of an estimate of your actual outcome
one must come to the conclusion that it’s better to not conduct the study at all.
Potential problems with testing lots of hypotheses until a significant one is found include
Declaring effects that are not significant as significant by chance
Misrepresenting the strength of the findings
What is Comparing your effects to familiar ones useful for
Mentally calibrating the size of an effect or its significance when a variable under study is not well understood
Negative control analyses are useful for
for evaluating processes to see if spurious effects are obtained
as a validity check of an effect of interest by looking to see if similar effects occur with the same analysis on variables where an effect is known not to be present