Topic 1: Design of Experiments Flashcards
Why is data important?
Use data for problem-solving
How many types of data scientist?
2: Professional and popular
What is different between professional and popular data scientist?
Professional data scientist is able to explore and work on different aspects of data life cycle
Popular data scientist works on dataset from a particular field, data detective
What are challenges with collecting dataset?
How to minimize?
Ethics, privacy, errors, missing values
Transparent plan and non-identifible subjects
What is the gold standard to gather data?
Randomised controlled trial (RCT)
random double-blinded allocation
What is domain knowledge?
Domain knowledge is the context background information needed to analyse and understand the dataset
During the process of gathering information or data, the evidence must be….
- Each piece of evidence needs to be weighed up equally
- Clear and well-cited
- Every stage of the study in research journal must be well-documented
- Journals needs to be reproducible research
What is reproducible research?
The author of the study needs to present data set and software used in the study for further verifying and altenative analyses
How many types of controlled experiments are there?
2: Contemporaneous (happens at the same time with the treatment grouop)
Historical (happens before the treatment group)
What is placebo?
A pretend treatment, designed to be neutral and indistinguishable from the treatment
The participants in double-blinded trial don’t know whether they are receiving the real or placebo treatment
What is placebo effect?
A phenomenon in which the subjects/receivers think they are having the treatment and responses to the idea
What is confounding?
when the results/effects get mixed up/become misleading because of a 3rd extraneous variable
=> confusing intepretation
What is bias?
A factor affecting the ablity of the data to precisely measure the effect of the treatment
What are some confounders with RCT?
Explain each one
Selector bias: the 2 groups are not comparable
Observer bias: subjects/ investigators are aware of the identity of the 2 groups/the study ==> affects responses and evaluations + placebo effect
Consent bias: the subjects choose whether to join or not
What is observational study?
The investigator does NOT decide or allocate the subjects into groups
What results can be conveyed from observational study?
Observational study can only SUGGEST causation using suggestive verbs (link, increase the risks, risk factors,…), NOT PROVE causation
What are lurking variables in observational studies?
Various misleading hidden confounders
- Selection bias
- Survivor bias: worst subjects dropout of the study => improvement of the effects
- Adherers/non-adherers: the pattern can be due to whether the subjects adhere to the treatment/program or not rather than the treatment itself
What is a solution to minimize the confounders?
Slicing up the groups into subgroups based on the confounders in order to be able to compare the results
What is Simpson’s paradox?
A phenomenon in which patterns/trends can be observed in individual data sets but will disappear or reverse when combining all groups together.
The percentages in subgroups are different when combining the subgroups.
If a study involves a data set from the past, what is it called?
Historical control in which time will become a confounding variable
What is a control group?
A group in a study that receives the placebo treatment
What is a controlled experiment?
Investigators allocate subjects into 2 groups: control and treatment group
The effects of the variable/treatment is controlled
What is controlling for confounders?
By dividing up the groups into subgroups based on confounders, the influence of those variables can be reduced
What is the actual difference between RCT and observational studies?
In RCT, the investigator is the one allocating each participants into control/treatment group. This means that there is already intervention or manipulation to the variable being studied.
Meanwhile, in observational study, the investigator observes/studies the participants who are already exposed to the independent variable (eg: pp who already smoke since we cannot force pp into treatment group in which they have to start smoking) ==> No intervention