Midterm 1 Flashcards
What does it mean if a study is reproducible?
- One can repeat the original study using THE SAME data, materials, and methods
- The reproduction of the study confirms the soundness and reliability of original study
conclusions.
What does it mean if a study is replicable?
- One can repeat original study using the same materials, and methods but DIFFERENT data
- A study is deemed replicated if the replication study reached the same statistical conclusions
What is computational reproducibility?
- All aspects of data processing, analysis, visualization, and presentation are entirely and independently reproducible, yielding the exact same outputs.
What is the problem surrounding replication?
The typical research ecosystem promotes questionable research practices (QRPs), rather than promoting rigorous, replicable research; ultimately yielding too many studies that cannot be replicated. It can also perpetuate inequities, injustices, and biases in science.
What is the “Reproducibility Crisis”?
Replicability is not the norm; vast majority of studies could not be repeated without extensive consultation with original authors (only 46% of studies could be repeated).
What does low statistical power lead to?
Lower power = more false negatives
A true effect was there, but it wasn’t detected.
What can cognitive biases lead to?
Cognitive biases can hinder objectivity, can lead to false positives.
What is the problem with poor or inaccessible documentation?
It makes it so the experiment cannot be replicated or reproduced.
What is P-hacking?
Data processing and analytical choices made after seeing and interacting with your data. Results become data dependent, and no longer adhere to the original hypothesis testing model.
What is HARKing?
Hypothesizing After the Results are Known.
Ex. researcher who finds patterns through exploratory research presenting findings as though they were part of confirmatory research.
Hypothesis presented as if it was determined beforehand.
False positive rates are higher.
What is confirmation bias?
Finding stats and information that confirms the results that the researcher was studying.
What is the file drawer problem?
Where research results, especially negative ones, remain unpublished.
What is the solution to all the reproducibility problems?
Open Science
Science research conducted and communicated in an honest, accessible, and transparent way, such that - at a minimum – a study can be reproduced, but ideally, replicated. Replication builds strength of evidence.
What are the benefits of open science?
- Saves time and money by pursuing best leads and avoiding poor ones (because
poor ones were documented!) - Re-use methods / code that work, avoid using ones already found to be ineffective
- Avoids duplication while enabling replication
- Facilitates meta-analyses
- Promotes more rapid discovery*
- Democratizes science and promotes equitable access and relevance to all stakeholders
As witnessed with COVID-19 research*
What are registered reports?
When the study design is first peer review then the final study is also peer reviewed and no matter the results it is published.
What are the two goals of statistics?
Goal 1: To estimate the values if important quantities in a population of interest.
Goal 2: To specific claims, or statistical hypotheses, about those quantities in the population.
Why do we need statistics?
Measuring everyone in the population is almost infeasible. Statistics provides the tools necessary to reliably describe populations and draw inferences about them.
Population
All the individual units of interest
Sample
A subset of units from the population that we measure and analyze.
Estimation
The process of inferring an unknown quantity of a population using sample data.
Parameter
A quantity describing a population.
How do populations, parameters, samples, and estimates relate?
Populations <—> Parameters
Samples <—> Estimates
What are some characteristics of parameters?
They are constant, fixed, the truth (which we almost never know)
What are some characteristics of estimates?
They are random variables; they change from one sample to the next from the same population.
What is a sample of convenience?
A collection of individuals that happen to be available at the time.
What is sampling bias?
Sampling Bias is a systematic difference between a parameter and its estimate.
Sampling Bias arises when samples aren’t representative of the population.
Sampling Bias is typically difficult to deal with.
What is volunteer bias?
Volunteers for a study are likely to be different from the population.
What are the properties of a good sample?
A good sample is made up of random and independent selection of a large number of individuals. In a random sample, each member of a population has an independent and equal chance of being selected.
What is sampling error?
- Discrepancy between the population parameter and the sample estimate caused by chance.
- It is inevitable and expected, and can be managed and dealt with
- Because an estimate is a random variable, the value of an estimate is influenced by chance.
- Therefore, estimates will differ among random samples from the same population
What is sampling bias?
Systematic difference between estimates and parameters.
What is sampling error (shortened version)?
Discrepancy between estimates and parameters caused by chance.
What is a good random sample in terms of sampling error and sampling bias?
Good random samples minimize bias and make it possible to measure the amount of sampling error.
Accurate and precise versus inaccurate and imprecise
Accurate is how close to the target it is and precise is how many times it hits the same spot.
What is a variable?
A variable is a characteristic that differs among individuals or other sampling units.
What is data?
Data are measurements of one or more variables made on a sample of individuals.
What are nominal categorical variables? what are some examples?
No natural order to categories
- sex
- genotype
- drug treatment (e.g., aspirin vs. ibuprofen)
- province
- survival (i.e., live or die)
What are ordinal categorical variables? what are some examples?
Natural ordering to categories
- severity (mild, moderate, severe)
- light intensity (dim, moderate, bright)
What are discrete numerical variables? what are some examples?
They can be counted.
- Number of limbs
- Number of offspring
What are continuous numerical variables? what are some examples?
They can be measured
- Arm length
- Height
- Salt concentration (mg/L)
How are response (dependent) and explanatory (independent) variables connected?
We aim to predict response variables using explanatory variables.
What is frequency distribution?
Describes the number of times each value of a variable occurs in a sample (categorical or numerical).
What is probability distribution?
The distribution of the variable in the entire population (rarely known)
What is an observational study?
- Treatments are NOT assigned by researcher
- Can only evaluate associations between variables
- Cause and effect CANNOT be assessed
What is an experimental study?
- Treatments assigned randomly to individuals
- CAN assess cause and effect relationships between variables (given good experimental design)
What is a confounding variable?
A confounding variable is an unmeasured variable that changes in tandem with one or more of the measured variables in a study.
Why should we be aware of reverse causation?
Ex. hypothesized causality can be that feeding method affects infant growth rate when in actuality infant growth rate affects feeding method.
What is meta-analysis? (not super important)
An analysis of analyses
What is exploratory research?
- Characterized by the use of data to generate hypotheses about why something occurred
- Crucial for discovery, especially of unexpected patterns that then lead to new lines of research
- Typically proceeds without rigid analysis path; flexibility to explore different leads and angles of inquiry
- Open to an array of possible relationships and resulting interpretations
What is confirmatory research?
- Characterized by undertaking a study specifically designed to test an a priori hypothesis and associated predictions about what will occur
- Crucial for establishing diagnostic evidence for explanatory claims
- Proceeds with a clear study design and analysis plan that is strictly followed