Critical Appraisal Flashcards
what % of a sample is within one standard deviation from the mean
68%
what % of a sample is within 2 standard deviations from the mean
95%
equivalent to 95% confidence interval
what % of a sample is within 3 standard deviations from the mean
99%
what is “critical appraisal”
process of carefully and systematically assessing the outcome of scientific research to judge its trustworthiness, value and relevance in a particular clinical context
think about:
–> how SOLID is the RATIONALE for the research question
–> how IMPORTANT is the research question
–> what is the potential IMPACT of answering it
–> can i TRUST the RESULTS of this research
what does “reliability” refer to in the context of critical appraisal
precision/replicability
what does “validity” refer to in the context of critical appraisal
is the degree to which a measurement is concordant with the “true value”
many different types of validity, but all relates to HOW a test MEASURE what it PURPORTS to actually measure
what does “responsiveness” refer to in the context of critical appraisal
SENSITIVITY of the measurement to a CHANGE in the patient’s condition
what does “interpretability” refer to in the context of critical appraisal
what is the MEANING of a given score–> i.e above 10 on a PHQ 9 means depression
what are the 4 types of validity
- face validity
- content validity
- construct validity
- criterion validity
what it “face validity”
does it seem like it makes sense
what is “content validity”
the extent to which a measurement includes ALL of the concepts of the INTENDED CONSTRUCT but NOTHING MORE
–> i.e a PHQ 9 need to include all the criteria for depression, but shouldn’t include criteria for autism
what is “construct validity”
is the measurement related COHERENTLY to other RELATED but not observable constructs
i.e if a new scale for depression was completely unrelated to a scale that measures energy and concentration levels you would be concerned about the validity of the new scale (since energy and concentration are two of the criteria for depression)
what is “criterion validity”
the extent to which the measures PREDICT readily observable phenomena
i.e is the score on the pain scale related to how much pain medication the patient requests
which concepts look at the relationship between a diagnostic test and the actual presence of the disease?
specificity
sensitivity
PPV
NPV
what is the “null hypothesis”
the default outcome of every study assumes that there is NO statistical significance between the two variables that you are looking at in your study (i.e no relationship between smoking and risk of lung cancer)
in most studies, researcher is trying to disprove the null hypothesis –> i.e trying to prove that there is in fact a connection between smoking and lung cancer
what are the two types of errors in studies related to rejecting or failing to reject the null hypothesis
type I error and type II error
what is type I error
FALSE POSITIVE
when the investigator REJECTS a null hypothesis that is actually TRUE in the population
so it would be like saying there is a connection between smoking and lung cancer, when really there isnt
(example often given is telling a man he is pregnant)
what it type II error
FALSE NEGATIVE
when an investigator FAILS TO REJECT a null hypothesis that is ACTUALLY FALSE in the population
so it would be like saying there is no connection between smoking and lung cancer, when there actually is
(example often given is telling a very obviously pregnant woman that she is not pregnant)
what are some ways to remember type I vs type II error
what are some ways to remember type I vs type II error
can the sensitivity or specificity of a test ever change?
no, they are FIXED properties of a test
what is the “sensitivity” of a test (as a concept)
the TRUE POSITIVE RATE
you want to know how many people who HAVE the disease you are able to IDENTIFY with a test
determine this by comparing the number of patients who were identified by your test as having the disease to the number of patients who ACTUALLY have the disease
a highly sensitive test RARELY MISSES people with the disease and rarely has false negatives
what is the utility of a highly sensitive test
a highly sensitive test, when negative, RULES OUT a disease
–> its good at picking up positives, so if the test is negative, you can be pretty certain you dont have the disease (SNout)
–> you want high sensitivity tests for diseases that are really bad to miss, like brain cancer
what is the “specificity” of a test (as a concept)
the TRUE NEGATIVE rate
here the focus is on patients who do NOT have the disease –> focusing on the proportion of people WITHOUT the disease who have a NEGATIVE test
a test with high specificity will NOT identify HEALTHY people as having the disease –> we expect lots of true negatives and a very small amount of false positives
when do you want a highly specific test?
when a false positive might be harmful to the patient–> i.e mammography in young women might not be highly specific enough for breast cancer and place these patients at greater risk for invasive procedures without benefit
what is the utility of a highly specific test?
when a highly specific test is POSITIVE it RULES IN having the disease (SPin)
so if you take a highly specific test and it is positive, you can be fairly certain you have the disease–> if you take a highly sensitive test and it is negative, you can be fairly certain you do not have the disease
when it comes to diagnosing a low prevalence disease, what would be the ideal combination of tests used to pickup and diagnose the disease
you would want a high sensitivity screening test that would pick up all possible cases of the disease (with some false positives) and then a highly specific confirmatory test to rule in those who actually have the disease (and discard those who dont actually have it)
how do you measure sensitivity
sensitivity = true positive / (true positive + false negative)
how many people your test said had the disease / the number of people in the sample who actually have the disease total
how do you calculate specificity
specificity = true negative / (true negative + false positive)
how many people are negative for the disease according to your test / the number of people who are actually negative for the disease in the sample regardless of test result
what is the PPV
the probability that someone who has the POSITIVE TEST result actually HAS THE DISEASE
the PPV varies directly with what other value/concept
PPV varies directly with the patients PRE TEST PROBABILITY of having the disease (i.e their baseline risk)
what is NPV
the probability that someone who has a NEGATIVE test result actually DOES NOT HAVE the disease
NPV varies INVERSELY with that other value/concept
NPV varies INVERSELY with the prevalence of the disease or with patients pretest probability of having the disease
how do you measure PPV
PPV = true positive / (true positive + false positive)
basically, the proportion of all positive tests that are true positives
how do you measure NPV
NPV = true negative / (false negative + true negative)
basically, the proportion of all negative tests that are actually negative
how do you calculate the accuracy of a test
accuracy = (TP + TN) / (TP + TN + FP + FN)
basically, the proportion of all tests that are actually accurate
what is the “prevalence” of a disease
looks at ALL EXISTING cases of an illness or disease
i.e looking at a photo of 1000 people and asking how many of these people have black hair
how do you calculate prevalence
prevalence = (TP + FN) / (TP + FN + FP + TN)
basically, all of the cases of a disease in the population
what are the 3 ways to express prevalence
- point prevalence
- period prevalence
- lifetime prevalence
what is point prevalence
the number of cases at a certain time–> i.e a survey on Dec 2020 asking if you are actively smoking
what is period prevalence
the number of cases over a certain time frame–> usually 12 months
what is lifetime prevalence
the number of cases over one’s total lifetime
i.e a survey asking you if you have ever smoked in your life
what is incidence?
incidence looks at the number of NEW cases over a PERIOD of TIME
i.e if in population of 1000 people over two years, 50 people were dx lung cancer then the incidence is 50 cases per 1000 people in that period or 25 cases per 1000-person years (INCIDENCE RATE)
are the sensitivity or specificity of a test affected by prevalence of the disease in a population
NO
but PPV and NPV are
what are 5 types of data that are often analyzed in medical literature
continuous
dichotomous
categorical
time to event
time trends
what is an example of continuous data
weight, age
what is an example of dichotomous data
a yes/no diagnosis or yes/no medications
what is an example of categorical data
i.e type of housing (apartment, condo, house)
what is an example of time to event data
i.e time until death or time until rehospitalization
what is an example of time trends data
i.e rate of hospitalizations over time or number of calls per year
what is univariate analysis
compares only TWO variables or different categories of one variable
i.e depressed vs not depressed
what is multivariate analysis
more than one variable is included in the analysis
allows for risk adjustment
what is the dependent variable in the multivariate analysis
the outcome variable
what are independent variables in multivariate analysis
predictive factors or variables that need to be accounted for i.e to avoid confounding the data
what is the “efficacy” of an intervention
it is the extent to which the intervention does more good than harm under IDEAL circumstances
“efficacy is a DELICACY”–> i.e remember that efficacy is measured under strict, fancy, expensively run, randomized clinical trial conditions–> a “delicacy” situation
what is the “effectiveness” of an intervention
the extent to which an intervention does more harm than good when provided under USUAL circumstances of healthcare practice–> “real world setting”
effectiveness studies = “pragmatic trials”
what are the two major threats to the results of a study
random error and systematic error
what is “random error”
an error in a study that occurs due to pure random chance
i.e a study of 100 people, in a population where prevalence of depression is 4%, somehow includes zero people with depression just by chance
how do you help prevent random error in a study
increasing the sample size decreases the likelihood of random errors occurring–> it increases your POWER to detect findings
what is “systematic error”
aka BIAS
error in the design, conduct or analysis of the study that results in a MISTAKEN estimate of the treatment or exposures effect on the risk or outcome of the disease
systematic errors can DISTORT the results of a study in a particular direction i.e favoring a medication or not favoring it
what impact do systematic errors/biases have on a study
threaten the validity of the study
what are the two types of validity with regards to studies
external and internal validity
what is “external validity”
how generalizable are the findings of the study to the patient you see in front of you
what is “internal validity”
is the study actually measuring and investigating what it set out to do in the study
name a tool that researchers can use to measure and assess for bias in studies
the Cochrane Risk of Bias tool
what is sampling bias
when participants selected for the study are systematically DIFFERENT from those the results are generalized to–> i.e the patient in front of you
i.e a survey of high school students to measure teenage use to illegal drugs does not include high school dropouts
how can you reduce sampling bias
avoid convenience sampling
make sure that the target population is properly DEFINED
make sure that the study sample matches the target population as much as possible
what is selection bias
when there are systematic difference between baseline characteristics of the GROUPS that are COMPARED
i.e a study looking at healthy eating diets and health outcomes–> those who volunteer for the study might already be health conscious or come from high SES background
how do you reduce selection bias
randomization and/or ensure the choice of the right comparison group
what is measurement bias
the methods of measurement are not the same between groups of patients–> includes information bias, recall bias and lack of blinding
also the hawthorne effect
what is the hawthorne effect
The Hawthorne effect is a type of reactivity in which individuals modify an aspect of their behavior in response to their awareness of being observed.
how do you reduce measurement bias
use standardized, objective and previously validated methods of data collection
use a placebo or control group
what is information bias
type of measurement bias
information obtained about subjects is INADEQUATE, resulting in incorrect data
i.e in study looking at OCP use and risk of DVT, one MD does not ask about OCP use and another does a very detailed history about it
how do you reduce information bias
choose an appropriate study design
create a well designed PROTOCOL for data collection
train researchers to properly implement the protocol and handling
properly measure all exposures and outcomes
what is recall bias
type of measurement bias
recall of information about exposure to something differs between study groups
i.e study looking at chemical exposures and risk of ezcema in kids–> one anxious parent may recall all of the exposures child has had whereas another parent does not recall the exposures in as much detail
how do you reduce recall bias
could use records kept from before the outcome occurred and in some cases, keep the exact hypothesis concealed from the person being studied
what is lack of blinding
type of measurement bias
occurs if the participant or researcher is not blind to the treatment condition–> assessment may become biased
i.e psychaitrist tasked assessing whether a patients depression has improved using a clinical rating scale but he knows the patient is on an antidepressant–> may be unconsciously biased to rate the patient as having improved
how do you reduce lack of blinding bias
blind the participant/researcher
what is “confounding” bias
when two factors are associated with each other and the effect of one is confused with or distorted by the other
these biases can result in both type I and type II errors
i.e research study finds that caffeine use causes lung cancer when really it is that smokers drink a lot of coffee and it has nothing to do with coffee
how do you reduce confounding bias
repeated studies
do crossover studies where subjects act as their own controls
match each subject with a control with similar characteristics
what is lead time bias
early detection with an intervention is confused with thinking that the intervention leads to better survival
i.e cancer screening campaign makes it seem like survival has increased but the diseases natural history has not changed, the cancers are just picked up earlier by screening –> but even early ID (with or without early treatment) does not actually change the trajectory of the illness
how do you reduce lead time bias
measure “back end” survival–> i.e adjust survival according to the severity of the disease at the time of diagnosis
have longer enrollment periods and follow up on patient outcomes for longer
what is the most common type of study design in psychiatric research
experimental designs i.e RCTs
observational studies i.e cross sectional, cohort, case control
what type of study is a randomized control trial
a true experiment that tests an experimental hypothesis
neither the patient nor the doctor knows whether the patient is in a treatment or control group
“our study shows that drug X treats condition Y”
what types of measures can you get from a randomized control trial
odds ratio
relative risk
specific patient outcomes
what type of study is a cross sectional study
assess the frequency of disease at that specific time of the study
“out study shows that X risk factor is ASSOCIATED with disease Y but we cannot determine causally”
what measures can you get from a cross sectional study
disease prevalence
what type of study is a case control study
compares a group of individuals WITH disease to a group WITHOUT disease
looks to see if odds a previous exposure or risk factor will influence whether a disease or event happens
“out study shows that patients with lung cancer had higher odds of smoking than those without lung cancer”
what measures can you get from a case control study
odds ratio
what type of study is a cohort study
can be PROSPECTIVE (i.e follows people during the study) or RETROSPECTIVE (i.e data is already collected and now you are looking back)
compares a group with an exposure or risk factor to a group without the exposure or risk factor
looks to see if an exposure or a risk factor is associated with development of a disease i.e stroke or an event i.e death
“our study shows that patients with ADHD had a higher risk of sustaining TBI than non ADHD patients”
what measures can you get from a cohort study
relative risk
what factors can affect a therapeutic response in non-RCT trials? (i.e why do we need a placebo or control in RCTs)
- regression to the mean
- hawthorne effect
- desirability effect
- placebo effect
what is “Regression to the mean”
on repeated measurements over time, extreme values or symptoms tend to move closer to the mean–> i.e people tend to get better over time, especially in psychiatric disorders like dep and anx
what is desirability bias
patient/researcher wanting to show that the treatment works
why do we care so much about randomization in trials
allows us to balance out not only known biases and risk factors between groups, but more importantly, UNKNOWN biases and risk factors
randomization is most effective when sample sizes are large (small sample sizes result in an “underpowered” study–makes it hard to ensure the sample has been adequately randomized)
what are observational studies
studies where the researchers OBSERVE the effect of a risk factor, medical test, treatment or other intervention without trying to change who is or who is not exposed to it
do NOT have randomization or control groups
i.e cross sectional, case control and cohort studies
can you infer causality from cross sectional studies
no, only associations
are case control studies retrospective or prospective
always retrospective
they do not watch for an outcome–they compare prevalence of risk factors in the past before an already known outcome (i.e suicide)
can you calculate a relative risk from a case control study?
no (since youa lready have predetermined numbers of people that have the outcome + matched controls)–> you can calculate an ODDS RATIO tho
what is a weakness to case control studies
highly susceptible to various types of bias ie sampling bias, selection bias, recall bias
are cohort studies prospective or retrospective
prospective–> follows a group a subjects over time
what type of study is the only way to measure incidence
cohort study
can you calculate a relative risk with a cohort study
yes
there is a high risk for what type of bias with cohort studies
selection bias–> as you cant control who gets exposed to what intervention
i.e those who are prescribed an antidepressant may have more severe depression than those who do not
what is a p value
a value that indicates the PROBABILITY of getting results AT LEAST as extreme at the ones you observed in your study, given that the null hypothesis is correct
i.e what is the “statistical significance” of your results
p values CANNOT tell you:
-the magnitude of an effect
-the strength of the evidence
-the probability that the finding was due to chance
“So what information can you glean from a p-value? The most straightforward explanation I found came from Stuart Buck, vice president of research integrity at the Laura and John Arnold Foundation. Imagine, he said, that you have a coin that you suspect is weighted toward heads. (Your null hypothesis is then that the coin is fair.) You flip it 100 times and get more heads than tails. The p-value won’t tell you whether the coin is fair, but it will tell you the probability that you’d get at least as many heads as you did if the coin was fair. That’s it — nothing more.”
what is “Cohen’s d”
what you will see in an RCT to illustrate the MAGNITUDE of an EFFECT
can be the difference between 2 means or it can be STANDARDIZED–> the difference between two means i.e control and treatment group means
the larger the difference between the two means the larger the Cohen’s d and thus the larger the “effect size”
Cohen’s d of 0.2=small, 0.5 is medium and above 0.8 = large
a larger Cohen’s d suggests that the probability of superiority of the treatment/intervention over the control is HIGH and the number needed to treat to get the treatment outcome is LOWER
https://rpsychologist.com/cohend/
what p value indicates that there is a significant difference between control and treatment groups in a study
p<0.05
what does the Bonferroni Correction do?
in cases where you are making multiple comparisons between control and treatment groups (i.e multiple tests) then you increase the chance that there will, by chance, be a significant result that is a false positive
this is corrected for using the bonferroni correction
what does the T test do (study results)
it is an example of univariate analyses comparing means between two groups with normal distribution continuous data
what does the ANOVA/F-test do (study results)
example of univariate analysis comparing the means of more than two groups with normal distribution continuous data
what does linear regression do
example of multivariate analysis comparing means between groups
what does the chi squared test do
example of univariate analysis comparing dichotomous outcomes
what does logistic regression do
example of multivariate analysis comparing dichotomous outcomes
what does a kaplan meier curve do
univariate analysis for time to event data
what does cox proportional standard do
multivariate analysis for time to event data
how do you calculate absolute risk reduction (ARR)
ARR = control event rate - exposure event rate
i.e if there were 4 deaths in the control group of 100 people, CER = 4/10 = 4%
if there were 3 deaths in the treatment group of 100 people, EER = 3%
ARR = 4-3 = 1%
how do you calculate relative risk reduction (RRR)
RRR = ARR / control event rate
how do you calculate relative risk
exposure event rate / control event rate = RR
how do you calculate number needed to treat
NNT = 1/ARR
what does risk reduction tell us
how STRONGLY RELATED two factors are but does NOT tell us the magnitude of the risk
what does a RR of 1 indicate
both groups have the same risk
what does a RR above 1 tell us
exposure is associated with increased disease occurrence
what does RR below 1 tell us
exposure is associated with lower disease occurrence
an “excellent” benefit to an intervention would be reflected by a NNT in what range
2-4
a “meaningful health benefit” is associated with an NNT in what range
5-7
what does an odds ratio tell us
odds that a case is exposed / odds that a case is not exposed
**approximates the RR for rare events but becomes VERY INACCURATE when used for common events
what does an OR of 1 tell us
NO associated between exposure and the outcome
what does an OR above 1 tell us
greater odds of association with the exposure outcome
what does an OR below 1 tell us
lower odds of association with the exposure outcome
what does an OR of 1.2 indicate
that there is a 20% increase in the ODDS of an outcome with a given exposure
what is a hazard ratio
[chance of an event occurring in the treatment arm] / [chance of an event occurring in the control arm]
what does linear regression do
generates a “beta” which is the slope of a line
beta = 0 means there is no relationship
beta > 0 means there is a POSITIVE relationship
beta < 0 means there is a NEGATIVE relationship
what is a confidence interval?
a range of values within which the true mean of the population is expected to fall, with a specific probability
as sample size increases, confidence interval narrows
usually use 95% confidence interval
what does it mean if a 95% confidence interval for a mean difference between 2 variables INCLUDES zero
there is no significant difference and the null hypothesis is NOT rejected
what does it mean if the 95% confidence interval for odds ratio or relative risk includes 1
there is NO significant difference and the null hypothesis is NOT rejected
what does it mean if the confidence intervals between 2 groups do not overlap
a statistically significant difference exists
what does it mean if the confidence intervals between two groups overlap
usually no significant difference exists
name a group of criteria that can help establish epidemiological evidence of a CAUSAL relationship between a presumed cause and an observed effect
Bradford Hill criteria or Hill’s criteria for causation
how many factors are included in the Hill criteria for causation
9
what are the 9 Hill criteria for causation
- experimental evidence –> from a lab study or RCT
- strength of association–> what is the size of the RR or OR
- consistency–> same estimates of risk between different studies
- gradient–> increasing exposure leads to increasing rate of occurrence
- biological and clinical plausability–> judgment as to plausibility is based on clinical experience and known evidence
- specificity–> when an exposure is known to only be associated with the outcome of interest
- coherence–> aka “triangulation;” when the evidence from disparate types of studies “hang together”
- temporality–> realistic timing of outcome based on exposure
- analogy–> similar associations or causal relationships found in other relevant areas of epidemiology
how do you calculate an odds ratio
(odds of being exposed amongst cases) / (odds of being exposed amongst controls) = odds ratio
i.e in the table it is the usual ABCD layout where is cases and controls on the left and exposed the unexposed across the top, then the odds ratio would be = (A/B) / (C/D)